mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-03 09:21:45 +00:00
Merge branch 'main' into docs-4
This commit is contained in:
commit
d7c976c6d2
38 changed files with 4709 additions and 8876 deletions
3
docs/_static/css/my_theme.css
vendored
3
docs/_static/css/my_theme.css
vendored
|
@ -20,3 +20,6 @@
|
|||
h3 {
|
||||
font-weight: normal;
|
||||
}
|
||||
html[data-theme="dark"] .rst-content div[class^="highlight"] {
|
||||
background-color: #0b0b0b;
|
||||
}
|
||||
|
|
9
docs/_static/js/detect_theme.js
vendored
Normal file
9
docs/_static/js/detect_theme.js
vendored
Normal file
|
@ -0,0 +1,9 @@
|
|||
document.addEventListener("DOMContentLoaded", function () {
|
||||
const prefersDark = window.matchMedia("(prefers-color-scheme: dark)").matches;
|
||||
const htmlElement = document.documentElement;
|
||||
if (prefersDark) {
|
||||
htmlElement.setAttribute("data-theme", "dark");
|
||||
} else {
|
||||
htmlElement.setAttribute("data-theme", "light");
|
||||
}
|
||||
});
|
|
@ -112,6 +112,8 @@ html_theme_options = {
|
|||
# "style_nav_header_background": "#c3c9d4",
|
||||
}
|
||||
|
||||
default_dark_mode = False
|
||||
|
||||
html_static_path = ["../_static"]
|
||||
# html_logo = "../_static/llama-stack-logo.png"
|
||||
# html_style = "../_static/css/my_theme.css"
|
||||
|
@ -119,6 +121,7 @@ html_static_path = ["../_static"]
|
|||
|
||||
def setup(app):
|
||||
app.add_css_file("css/my_theme.css")
|
||||
app.add_js_file("js/detect_theme.js")
|
||||
|
||||
def dockerhub_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
|
||||
url = f"https://hub.docker.com/r/llamastack/{text}"
|
||||
|
|
|
@ -7,13 +7,13 @@ In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a
|
|||
|
||||
First, create a local Kubernetes cluster via Kind:
|
||||
|
||||
```bash
|
||||
```
|
||||
kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test
|
||||
```
|
||||
|
||||
First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model:
|
||||
|
||||
```bash
|
||||
```
|
||||
cat <<EOF |kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
|
@ -39,7 +39,7 @@ data:
|
|||
|
||||
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
||||
|
||||
```bash
|
||||
```
|
||||
cat <<EOF |kubectl apply -f -
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
|
@ -95,7 +95,7 @@ EOF
|
|||
|
||||
We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):
|
||||
|
||||
```bash
|
||||
```
|
||||
$ kubectl logs -l app.kubernetes.io/name=vllm
|
||||
...
|
||||
INFO: Started server process [1]
|
||||
|
@ -119,7 +119,7 @@ providers:
|
|||
|
||||
Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:
|
||||
|
||||
```bash
|
||||
```
|
||||
cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
|
||||
FROM distribution-myenv:dev
|
||||
|
||||
|
@ -135,7 +135,7 @@ podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t
|
|||
|
||||
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
||||
|
||||
```bash
|
||||
```
|
||||
cat <<EOF |kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
|
@ -195,7 +195,7 @@ EOF
|
|||
### Verifying the Deployment
|
||||
We can check that the LlamaStack server has started:
|
||||
|
||||
```bash
|
||||
```
|
||||
$ kubectl logs -l app.kubernetes.io/name=llama-stack
|
||||
...
|
||||
INFO: Started server process [1]
|
||||
|
@ -207,7 +207,7 @@ INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit
|
|||
|
||||
Finally, we forward the Kubernetes service to a local port and test some inference requests against it via the Llama Stack Client:
|
||||
|
||||
```bash
|
||||
```
|
||||
kubectl port-forward service/llama-stack-service 5000:5000
|
||||
llama-stack-client --endpoint http://localhost:5000 inference chat-completion --message "hello, what model are you?"
|
||||
```
|
||||
|
|
|
@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following
|
|||
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
||||
|
||||
|
||||
You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
|
||||
You can use this distribution if you want to run an independent vLLM server for inference.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
|
@ -41,7 +41,10 @@ The following environment variables can be configured:
|
|||
|
||||
## Setting up vLLM server
|
||||
|
||||
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
|
||||
In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
|
||||
server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
|
||||
[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
|
||||
that we only use GPUs here for demonstration purposes.
|
||||
|
||||
### Setting up vLLM server on AMD GPU
|
||||
|
||||
|
|
|
@ -103,7 +103,5 @@ llama stack run together
|
|||
|
||||
2. Start Streamlit UI
|
||||
```bash
|
||||
cd llama_stack/distribution/ui
|
||||
pip install -r requirements.txt
|
||||
streamlit run app.py
|
||||
uv run --with ".[ui]" streamlit run llama_stack/distribution/ui/app.py
|
||||
```
|
||||
|
|
|
@ -36,9 +36,7 @@ llama-stack-client benchmarks register \
|
|||
3. Start Streamlit UI
|
||||
|
||||
```bash
|
||||
cd llama_stack/distribution/ui
|
||||
pip install -r requirements.txt
|
||||
streamlit run app.py
|
||||
uv run --with ".[ui]" streamlit run llama_stack/distribution/ui/app.py
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
|
|
@ -19,6 +19,7 @@ class LlamaStackApi:
|
|||
"together_api_key": os.environ.get("TOGETHER_API_KEY", ""),
|
||||
"sambanova_api_key": os.environ.get("SAMBANOVA_API_KEY", ""),
|
||||
"openai_api_key": os.environ.get("OPENAI_API_KEY", ""),
|
||||
"tavily_search_api_key": os.environ.get("TAVILY_SEARCH_API_KEY", ""),
|
||||
},
|
||||
)
|
||||
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
streamlit
|
||||
pandas
|
||||
llama-stack-client>=0.0.55
|
||||
llama-stack-client>=0.2.1
|
||||
streamlit-option-menu
|
||||
llama-stack>=0.1.9
|
||||
llama-stack>=0.2.1
|
||||
|
|
|
@ -70,6 +70,9 @@ class ModelArgs(BaseModel):
|
|||
attention_chunk_size: Optional[int] = None
|
||||
rope_theta: float = 500000
|
||||
use_scaled_rope: bool = False
|
||||
rope_scaling_factor: Optional[float] = None
|
||||
rope_high_freq_factor: Optional[float] = None
|
||||
|
||||
nope_layer_interval: Optional[int] = None # No position encoding in every n layers
|
||||
use_qk_norm: bool = False
|
||||
# Set to True to enable inference-time temperature tuning (useful for very long context)
|
||||
|
@ -92,4 +95,14 @@ class ModelArgs(BaseModel):
|
|||
f"n_heads ({self.n_heads}) must be divisible by n_kv_heads ({self.n_kv_heads})"
|
||||
)
|
||||
assert self.dim % self.n_heads == 0, f"dim ({self.dim}) must be divisible by n_heads ({self.n_heads})"
|
||||
|
||||
if self.use_scaled_rope:
|
||||
# NOTE: ideally these values should have come from params.json. However, we have
|
||||
# shipped the models everywhere. Only Llama-4-Scout uses scaled rope and needs these
|
||||
# specific values.
|
||||
if self.rope_scaling_factor is None:
|
||||
self.rope_scaling_factor = 16
|
||||
if self.rope_high_freq_factor is None:
|
||||
self.rope_high_freq_factor = 1
|
||||
|
||||
return self
|
||||
|
|
|
@ -23,37 +23,25 @@ from .ffn import FeedForward
|
|||
from .moe import MoE
|
||||
|
||||
|
||||
def rmsnorm(x, eps):
|
||||
def _norm(y):
|
||||
return y * torch.rsqrt(y.pow(2).mean(-1, keepdim=True) + eps)
|
||||
|
||||
return _norm(x.float()).type_as(x)
|
||||
|
||||
|
||||
class RMSNorm(torch.nn.Module):
|
||||
def __init__(self, dim: int, eps: float = 1e-6):
|
||||
super().__init__()
|
||||
self.eps = eps
|
||||
self.weight = nn.Parameter(torch.ones(dim))
|
||||
|
||||
def _norm(self, x):
|
||||
return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)
|
||||
|
||||
def forward(self, x):
|
||||
output = self._norm(x.float()).type_as(x)
|
||||
return output * self.weight
|
||||
return rmsnorm(x, self.eps) * self.weight
|
||||
|
||||
|
||||
class L2Norm(torch.nn.Module):
|
||||
def __init__(self, dim: int, eps: float = 1e-6):
|
||||
super().__init__()
|
||||
self.eps = eps
|
||||
|
||||
def _norm(self, x):
|
||||
return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)
|
||||
|
||||
def forward(self, x):
|
||||
return self._norm(x.float()).type_as(x)
|
||||
|
||||
|
||||
def apply_scaling(freqs: torch.Tensor):
|
||||
# Values obtained from grid search
|
||||
scale_factor = 8
|
||||
def apply_scaling(freqs: torch.Tensor, scale_factor: float, high_freq_factor: float):
|
||||
low_freq_factor = 1
|
||||
high_freq_factor = 4
|
||||
old_context_len = 8192 # original llama3 length
|
||||
|
||||
low_freq_wavelen = old_context_len / low_freq_factor
|
||||
|
@ -72,11 +60,18 @@ def apply_scaling(freqs: torch.Tensor):
|
|||
return torch.tensor(new_freqs, dtype=freqs.dtype, device=freqs.device)
|
||||
|
||||
|
||||
def precompute_freqs_cis(dim: int, end: int, theta: float = 10000.0, use_scaled: bool = False):
|
||||
def precompute_freqs_cis(
|
||||
dim: int,
|
||||
end: int,
|
||||
theta: float,
|
||||
use_scaled: bool,
|
||||
scale_factor: float,
|
||||
high_freq_factor: float,
|
||||
):
|
||||
freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))
|
||||
t = torch.arange(end, device=freqs.device, dtype=torch.float32)
|
||||
if use_scaled:
|
||||
freqs = apply_scaling(freqs)
|
||||
freqs = apply_scaling(freqs, scale_factor, high_freq_factor)
|
||||
freqs = torch.outer(t, freqs)
|
||||
freqs_cis = torch.polar(torch.ones_like(freqs), freqs) # complex64
|
||||
return freqs_cis
|
||||
|
@ -174,9 +169,7 @@ class Attention(nn.Module):
|
|||
self.head_dim,
|
||||
)
|
||||
).cuda()
|
||||
self.qk_norm = None
|
||||
if self.use_qk_norm:
|
||||
self.qk_norm = L2Norm(args.norm_eps)
|
||||
self.norm_eps = args.norm_eps
|
||||
self._register_load_state_dict_pre_hook(self.load_hook)
|
||||
|
||||
def load_hook(
|
||||
|
@ -220,8 +213,8 @@ class Attention(nn.Module):
|
|||
xq, xk = apply_rotary_emb(xq, xk, freqs_cis=freqs_cis)
|
||||
|
||||
if self.use_qk_norm:
|
||||
xq = self.qk_norm(xq)
|
||||
xk = self.qk_norm(xk)
|
||||
xq = rmsnorm(xq, self.norm_eps)
|
||||
xk = rmsnorm(xk, self.norm_eps)
|
||||
|
||||
# We are applying temperature tuning (https://arxiv.org/abs/2501.19399) to NoPE layers, where
|
||||
# the inference-time temperature tuning function is customized to not affect short context
|
||||
|
@ -362,6 +355,8 @@ class Transformer(nn.Module):
|
|||
args.max_seq_len * 2,
|
||||
args.rope_theta,
|
||||
args.use_scaled_rope,
|
||||
args.rope_scaling_factor,
|
||||
args.rope_high_freq_factor,
|
||||
)
|
||||
vision_args = self.args.vision_args
|
||||
if vision_args:
|
||||
|
|
|
@ -91,7 +91,7 @@ def convert_to_quantized_model(
|
|||
log_status(f"Rank {rank}: Quantizing int4 weights from bf16")
|
||||
|
||||
def apply_quantization(_, weight):
|
||||
return quantize_int4(weight, fp8_activation_scale_ub, output_device=torch.device("cuda"))
|
||||
return quantize_int4(weight, output_device=torch.device("cuda"))
|
||||
|
||||
else:
|
||||
fp8_scales_path = os.path.join(checkpoint_dir, f"fp8_scales_{rank}.pt")
|
||||
|
|
|
@ -65,7 +65,7 @@ class Int4Weights(
|
|||
Int4ScaledWeights,
|
||||
collections.namedtuple(
|
||||
"Int4Weights",
|
||||
["weight", "scale", "zero_point", "shape", "activation_scale_ub"],
|
||||
["weight", "scale", "zero_point", "shape"],
|
||||
),
|
||||
):
|
||||
pass
|
||||
|
@ -184,20 +184,13 @@ def quantize_fp8(
|
|||
@torch.inference_mode()
|
||||
def quantize_int4(
|
||||
w: Tensor,
|
||||
fp8_activation_scale_ub: float,
|
||||
output_device: Optional[torch.device] = None,
|
||||
) -> Int4Weights:
|
||||
"""Quantize [n, k/2] weight tensor.
|
||||
|
||||
Args:
|
||||
w (Tensor): [n, k/2] input high precision tensor to quantize.
|
||||
fp8_activation_scale_ub (float): Upper bound for activation max.
|
||||
"""
|
||||
activation_scale_ub = torch.tensor(
|
||||
[fp8_activation_scale_ub],
|
||||
dtype=torch.float,
|
||||
device=output_device,
|
||||
)
|
||||
if w.ndim >= 3:
|
||||
wq, scale, zero_point = zip(*[int4_row_quantize(i) for i in w], strict=False)
|
||||
wq = torch.stack([pack_int4(i) for i in wq], dim=0)
|
||||
|
@ -212,7 +205,6 @@ def quantize_int4(
|
|||
scale=scale.to(output_device),
|
||||
zero_point=zero_point.to(output_device),
|
||||
shape=wq.shape,
|
||||
activation_scale_ub=activation_scale_ub,
|
||||
)
|
||||
|
||||
|
||||
|
@ -247,26 +239,18 @@ def load_int4(
|
|||
w: Tensor,
|
||||
scale: Tensor,
|
||||
zero_point: Tensor,
|
||||
fp8_activation_scale_ub: float,
|
||||
output_device: Optional[torch.device] = None,
|
||||
) -> Int4Weights:
|
||||
"""Load INT4 [n, k/2] weight tensor.
|
||||
|
||||
Args:
|
||||
w (Tensor): [n, k/2] input INT4.
|
||||
fp8_activation_scale_ub (float): Upper bound for activation max.
|
||||
"""
|
||||
activation_scale_ub = torch.tensor(
|
||||
[fp8_activation_scale_ub],
|
||||
dtype=torch.float,
|
||||
device=output_device,
|
||||
)
|
||||
return Int4Weights(
|
||||
weight=w.to(torch.int8).to(device=output_device),
|
||||
scale=scale.to(device=output_device),
|
||||
zero_point=zero_point.to(device=output_device),
|
||||
shape=w.shape,
|
||||
activation_scale_ub=activation_scale_ub,
|
||||
)
|
||||
|
||||
|
||||
|
|
|
@ -13,7 +13,7 @@ The `llamastack/distribution-{{ name }}` distribution consists of the following
|
|||
|
||||
{{ providers_table }}
|
||||
|
||||
You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
|
||||
You can use this distribution if you want to run an independent vLLM server for inference.
|
||||
|
||||
{% if run_config_env_vars %}
|
||||
### Environment Variables
|
||||
|
@ -28,7 +28,10 @@ The following environment variables can be configured:
|
|||
|
||||
## Setting up vLLM server
|
||||
|
||||
Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
|
||||
In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
|
||||
server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
|
||||
[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
|
||||
that we only use GPUs here for demonstration purposes.
|
||||
|
||||
### Setting up vLLM server on AMD GPU
|
||||
|
||||
|
|
|
@ -89,6 +89,12 @@ docs = [
|
|||
"tomli",
|
||||
]
|
||||
codegen = ["rich", "pydantic", "jinja2>=3.1.6"]
|
||||
ui = [
|
||||
"streamlit",
|
||||
"pandas",
|
||||
"llama-stack-client>=0.2.1",
|
||||
"streamlit-option-menu",
|
||||
]
|
||||
|
||||
[project.urls]
|
||||
Homepage = "https://github.com/meta-llama/llama-stack"
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Test Results Report
|
||||
|
||||
*Generated on: 2025-04-08 21:14:02*
|
||||
*Generated on: 2025-04-09 22:52:19*
|
||||
|
||||
*This report was generated by running `python tests/verifications/generate_report.py`*
|
||||
|
||||
|
@ -23,66 +23,107 @@
|
|||
|
||||
## Together
|
||||
|
||||
*Tests run on: 2025-04-08 16:19:59*
|
||||
*Tests run on: 2025-04-09 22:50:58*
|
||||
|
||||
```bash
|
||||
pytest tests/verifications/openai/test_chat_completion.py --provider=together -v
|
||||
# Run all tests for this provider:
|
||||
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -v
|
||||
|
||||
# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
|
||||
pytest tests/verifications/openai_api/test_chat_completion.py --provider=together -k "test_chat_non_streaming_basic and earth"
|
||||
```
|
||||
|
||||
| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-17B-128E-Instruct | Llama-4-Scout-17B-16E-Instruct |
|
||||
|
||||
**Model Key (Together)**
|
||||
|
||||
| Display Name | Full Model ID |
|
||||
| --- | --- |
|
||||
| Llama-3.3-70B-Instruct | `meta-llama/Llama-3.3-70B-Instruct-Turbo` |
|
||||
| Llama-4-Maverick-Instruct | `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` |
|
||||
| Llama-4-Scout-Instruct | `meta-llama/Llama-4-Scout-17B-16E-Instruct` |
|
||||
|
||||
|
||||
| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-Instruct | Llama-4-Scout-Instruct |
|
||||
| --- | --- | --- | --- |
|
||||
| test_chat_non_streaming_basic (case 0) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (case 1) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_image (case 0) | ⚪ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (case 0) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (case 1) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_tool_calling (case 0) | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (case 0) | ✅ | ❌ | ❌ |
|
||||
| test_chat_streaming_basic (case 1) | ✅ | ❌ | ❌ |
|
||||
| test_chat_streaming_image (case 0) | ⚪ | ❌ | ❌ |
|
||||
| test_chat_streaming_structured_output (case 0) | ✅ | ❌ | ❌ |
|
||||
| test_chat_streaming_structured_output (case 1) | ✅ | ❌ | ❌ |
|
||||
| test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_image | ⚪ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_tool_calling | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (earth) | ✅ | ❌ | ❌ |
|
||||
| test_chat_streaming_basic (saturn) | ✅ | ❌ | ❌ |
|
||||
| test_chat_streaming_image | ⚪ | ❌ | ❌ |
|
||||
| test_chat_streaming_structured_output (calendar) | ✅ | ❌ | ❌ |
|
||||
| test_chat_streaming_structured_output (math) | ✅ | ❌ | ❌ |
|
||||
|
||||
## Fireworks
|
||||
|
||||
*Tests run on: 2025-04-08 16:18:28*
|
||||
*Tests run on: 2025-04-09 22:50:02*
|
||||
|
||||
```bash
|
||||
pytest tests/verifications/openai/test_chat_completion.py --provider=fireworks -v
|
||||
# Run all tests for this provider:
|
||||
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -v
|
||||
|
||||
# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
|
||||
pytest tests/verifications/openai_api/test_chat_completion.py --provider=fireworks -k "test_chat_non_streaming_basic and earth"
|
||||
```
|
||||
|
||||
| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-17B-128E-Instruct | Llama-4-Scout-17B-16E-Instruct |
|
||||
|
||||
**Model Key (Fireworks)**
|
||||
|
||||
| Display Name | Full Model ID |
|
||||
| --- | --- |
|
||||
| Llama-3.3-70B-Instruct | `accounts/fireworks/models/llama-v3p3-70b-instruct` |
|
||||
| Llama-4-Maverick-Instruct | `accounts/fireworks/models/llama4-maverick-instruct-basic` |
|
||||
| Llama-4-Scout-Instruct | `accounts/fireworks/models/llama4-scout-instruct-basic` |
|
||||
|
||||
|
||||
| Test | Llama-3.3-70B-Instruct | Llama-4-Maverick-Instruct | Llama-4-Scout-Instruct |
|
||||
| --- | --- | --- | --- |
|
||||
| test_chat_non_streaming_basic (case 0) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (case 1) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_image (case 0) | ⚪ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (case 0) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (case 1) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_tool_calling (case 0) | ✅ | ❌ | ❌ |
|
||||
| test_chat_streaming_basic (case 0) | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (case 1) | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_image (case 0) | ⚪ | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (case 0) | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (case 1) | ❌ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (earth) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (saturn) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_image | ⚪ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (math) | ✅ | ✅ | ✅ |
|
||||
| test_chat_non_streaming_tool_calling | ❌ | ❌ | ❌ |
|
||||
| test_chat_streaming_basic (earth) | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (saturn) | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_image | ⚪ | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (calendar) | ✅ | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (math) | ✅ | ✅ | ✅ |
|
||||
|
||||
## Openai
|
||||
|
||||
*Tests run on: 2025-04-08 16:22:02*
|
||||
*Tests run on: 2025-04-09 22:51:44*
|
||||
|
||||
```bash
|
||||
pytest tests/verifications/openai/test_chat_completion.py --provider=openai -v
|
||||
# Run all tests for this provider:
|
||||
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -v
|
||||
|
||||
# Example: Run only the 'earth' case of test_chat_non_streaming_basic:
|
||||
pytest tests/verifications/openai_api/test_chat_completion.py --provider=openai -k "test_chat_non_streaming_basic and earth"
|
||||
```
|
||||
|
||||
|
||||
**Model Key (Openai)**
|
||||
|
||||
| Display Name | Full Model ID |
|
||||
| --- | --- |
|
||||
| gpt-4o | `gpt-4o` |
|
||||
| gpt-4o-mini | `gpt-4o-mini` |
|
||||
|
||||
|
||||
| Test | gpt-4o | gpt-4o-mini |
|
||||
| --- | --- | --- |
|
||||
| test_chat_non_streaming_basic (case 0) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (case 1) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_image (case 0) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (case 0) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (case 1) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_tool_calling (case 0) | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (case 0) | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (case 1) | ✅ | ✅ |
|
||||
| test_chat_streaming_image (case 0) | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (case 0) | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (case 1) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (earth) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_basic (saturn) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_image | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (calendar) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_structured_output (math) | ✅ | ✅ |
|
||||
| test_chat_non_streaming_tool_calling | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (earth) | ✅ | ✅ |
|
||||
| test_chat_streaming_basic (saturn) | ✅ | ✅ |
|
||||
| test_chat_streaming_image | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (calendar) | ✅ | ✅ |
|
||||
| test_chat_streaming_structured_output (math) | ✅ | ✅ |
|
||||
|
|
10
tests/verifications/conf/cerebras.yaml
Normal file
10
tests/verifications/conf/cerebras.yaml
Normal file
|
@ -0,0 +1,10 @@
|
|||
base_url: https://api.cerebras.ai/v1
|
||||
api_key_var: CEREBRAS_API_KEY
|
||||
models:
|
||||
- llama-3.3-70b
|
||||
model_display_names:
|
||||
llama-3.3-70b: Llama-3.3-70B-Instruct
|
||||
test_exclusions:
|
||||
llama-3.3-70b:
|
||||
- test_chat_non_streaming_image
|
||||
- test_chat_streaming_image
|
14
tests/verifications/conf/fireworks.yaml
Normal file
14
tests/verifications/conf/fireworks.yaml
Normal file
|
@ -0,0 +1,14 @@
|
|||
base_url: https://api.fireworks.ai/inference/v1
|
||||
api_key_var: FIREWORKS_API_KEY
|
||||
models:
|
||||
- accounts/fireworks/models/llama-v3p3-70b-instruct
|
||||
- accounts/fireworks/models/llama4-scout-instruct-basic
|
||||
- accounts/fireworks/models/llama4-maverick-instruct-basic
|
||||
model_display_names:
|
||||
accounts/fireworks/models/llama-v3p3-70b-instruct: Llama-3.3-70B-Instruct
|
||||
accounts/fireworks/models/llama4-scout-instruct-basic: Llama-4-Scout-Instruct
|
||||
accounts/fireworks/models/llama4-maverick-instruct-basic: Llama-4-Maverick-Instruct
|
||||
test_exclusions:
|
||||
accounts/fireworks/models/llama-v3p3-70b-instruct:
|
||||
- test_chat_non_streaming_image
|
||||
- test_chat_streaming_image
|
14
tests/verifications/conf/groq.yaml
Normal file
14
tests/verifications/conf/groq.yaml
Normal file
|
@ -0,0 +1,14 @@
|
|||
base_url: https://api.groq.com/openai/v1
|
||||
api_key_var: GROQ_API_KEY
|
||||
models:
|
||||
- llama-3.3-70b-versatile
|
||||
- llama-4-scout-17b-16e-instruct
|
||||
- llama-4-maverick-17b-128e-instruct
|
||||
model_display_names:
|
||||
llama-3.3-70b-versatile: Llama-3.3-70B-Instruct
|
||||
llama-4-scout-17b-16e-instruct: Llama-4-Scout-Instruct
|
||||
llama-4-maverick-17b-128e-instruct: Llama-4-Maverick-Instruct
|
||||
test_exclusions:
|
||||
llama-3.3-70b-versatile:
|
||||
- test_chat_non_streaming_image
|
||||
- test_chat_streaming_image
|
9
tests/verifications/conf/openai.yaml
Normal file
9
tests/verifications/conf/openai.yaml
Normal file
|
@ -0,0 +1,9 @@
|
|||
base_url: https://api.openai.com/v1
|
||||
api_key_var: OPENAI_API_KEY
|
||||
models:
|
||||
- gpt-4o
|
||||
- gpt-4o-mini
|
||||
model_display_names:
|
||||
gpt-4o: gpt-4o
|
||||
gpt-4o-mini: gpt-4o-mini
|
||||
test_exclusions: {}
|
14
tests/verifications/conf/together.yaml
Normal file
14
tests/verifications/conf/together.yaml
Normal file
|
@ -0,0 +1,14 @@
|
|||
base_url: https://api.together.xyz/v1
|
||||
api_key_var: TOGETHER_API_KEY
|
||||
models:
|
||||
- meta-llama/Llama-3.3-70B-Instruct-Turbo
|
||||
- meta-llama/Llama-4-Scout-17B-16E-Instruct
|
||||
- meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
|
||||
model_display_names:
|
||||
meta-llama/Llama-3.3-70B-Instruct-Turbo: Llama-3.3-70B-Instruct
|
||||
meta-llama/Llama-4-Scout-17B-16E-Instruct: Llama-4-Scout-Instruct
|
||||
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8: Llama-4-Maverick-Instruct
|
||||
test_exclusions:
|
||||
meta-llama/Llama-3.3-70B-Instruct-Turbo:
|
||||
- test_chat_non_streaming_image
|
||||
- test_chat_streaming_image
|
|
@ -4,6 +4,10 @@
|
|||
# This source code is licensed under the terms described in the LICENSE file in
|
||||
# the root directory of this source tree.
|
||||
|
||||
import re
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def pytest_addoption(parser):
|
||||
parser.addoption(
|
||||
|
@ -14,7 +18,7 @@ def pytest_addoption(parser):
|
|||
parser.addoption(
|
||||
"--api-key",
|
||||
action="store",
|
||||
help="API key",
|
||||
help="API key to use for the provider",
|
||||
)
|
||||
parser.addoption(
|
||||
"--provider",
|
||||
|
@ -24,5 +28,64 @@ def pytest_addoption(parser):
|
|||
|
||||
|
||||
pytest_plugins = [
|
||||
"tests.verifications.openai.fixtures.fixtures",
|
||||
"pytest_jsonreport",
|
||||
"tests.verifications.openai_api.fixtures.fixtures",
|
||||
"tests.verifications.openai_api.fixtures.load",
|
||||
]
|
||||
|
||||
|
||||
@pytest.hookimpl(optionalhook=True)
|
||||
def pytest_json_runtest_metadata(item, call):
|
||||
"""Add model and case_id to pytest-json report metadata."""
|
||||
metadata = {}
|
||||
nodeid = item.nodeid
|
||||
|
||||
# 1. Extract model from callspec if available
|
||||
model = item.callspec.params.get("model") if hasattr(item, "callspec") else None
|
||||
if model:
|
||||
metadata["model"] = model
|
||||
else:
|
||||
# Fallback: Try parsing from nodeid (less reliable)
|
||||
match_model = re.search(r"\[(.*?)-", nodeid)
|
||||
if match_model:
|
||||
model = match_model.group(1) # Store model even if found via fallback
|
||||
metadata["model"] = model
|
||||
else:
|
||||
print(f"Warning: Could not determine model for test {nodeid}")
|
||||
model = None # Ensure model is None if not found
|
||||
|
||||
# 2. Extract case_id using the known model string if possible
|
||||
if model:
|
||||
# Construct a regex pattern to find the case_id *after* the model name and a hyphen.
|
||||
# Escape the model name in case it contains regex special characters.
|
||||
pattern = re.escape(model) + r"-(.*?)\]$"
|
||||
match_case = re.search(pattern, nodeid)
|
||||
if match_case:
|
||||
case_id = match_case.group(1)
|
||||
metadata["case_id"] = case_id
|
||||
else:
|
||||
# Fallback if the pattern didn't match (e.g., nodeid format unexpected)
|
||||
# Try the old less specific regex as a last resort.
|
||||
match_case_fallback = re.search(r"-(.*?)\]$", nodeid)
|
||||
if match_case_fallback:
|
||||
case_id = match_case_fallback.group(1)
|
||||
metadata["case_id"] = case_id
|
||||
print(f"Warning: Used fallback regex to parse case_id from nodeid {nodeid}")
|
||||
else:
|
||||
print(f"Warning: Could not parse case_id from nodeid {nodeid} even with fallback.")
|
||||
if "case" in (item.callspec.params if hasattr(item, "callspec") else {}):
|
||||
metadata["case_id"] = "parsing_failed"
|
||||
elif "case" in (item.callspec.params if hasattr(item, "callspec") else {}):
|
||||
# Cannot reliably parse case_id without model, but we know it's a case test.
|
||||
# Try the generic fallback regex.
|
||||
match_case_fallback = re.search(r"-(.*?)\]$", nodeid)
|
||||
if match_case_fallback:
|
||||
case_id = match_case_fallback.group(1)
|
||||
metadata["case_id"] = case_id
|
||||
print(f"Warning: Used fallback regex to parse case_id from nodeid {nodeid} (model unknown)")
|
||||
else:
|
||||
print(f"Warning: Could not parse case_id from nodeid {nodeid} (model unknown)")
|
||||
metadata["case_id"] = "parsing_failed_no_model"
|
||||
# else: Not a test with a model or case param we need to handle.
|
||||
|
||||
return metadata
|
||||
|
|
|
@ -4,27 +4,48 @@
|
|||
# This source code is licensed under the terms described in the LICENSE file in
|
||||
# the root directory of this source tree.
|
||||
|
||||
# /// script
|
||||
# requires-python = ">=3.10"
|
||||
# dependencies = [
|
||||
# "pytest-json-report",
|
||||
# "pyyaml",
|
||||
# ]
|
||||
# ///
|
||||
"""
|
||||
Test Report Generator
|
||||
|
||||
Requirements:
|
||||
pip install pytest-json-report
|
||||
Description:
|
||||
This script runs pytest tests (specifically designed for OpenAI API compatibility checks)
|
||||
for different providers, aggregates the results from JSON reports, and generates
|
||||
a markdown summary report (REPORT.md).
|
||||
|
||||
It automatically cleans up old test result files, keeping only the latest
|
||||
per provider.
|
||||
|
||||
|
||||
Configuration:
|
||||
- Provider details (models, display names) are loaded from `tests/verifications/config.yaml`.
|
||||
- Test cases are defined in YAML files within `tests/verifications/openai_api/fixtures/test_cases/`.
|
||||
- Test results are stored in `tests/verifications/test_results/`.
|
||||
|
||||
Usage:
|
||||
# Generate a report using existing test results
|
||||
# Generate a report using the latest existing test results
|
||||
python tests/verifications/generate_report.py
|
||||
|
||||
# Run tests and generate a report
|
||||
# Run tests for all configured providers and generate a report
|
||||
python tests/verifications/generate_report.py --run-tests
|
||||
|
||||
# Run tests for specific providers
|
||||
# Run tests only for specific providers (space-separated)
|
||||
python tests/verifications/generate_report.py --run-tests --providers fireworks openai
|
||||
|
||||
# Run tests matching a keyword expression (uses pytest -k)
|
||||
python tests/verifications/generate_report.py --run-tests --providers fireworks --k "streaming"
|
||||
|
||||
# Run a specific test case for a provider
|
||||
python tests/verifications/generate_report.py --run-tests --providers fireworks --k "test_chat_streaming_basic and basic_earth"
|
||||
|
||||
# Save the report to a custom location
|
||||
python tests/verifications/generate_report.py --output custom_report.md
|
||||
|
||||
# Clean up old test result files
|
||||
python tests/verifications/generate_report.py --cleanup
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
@ -35,6 +56,9 @@ import subprocess
|
|||
import time
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Any, DefaultDict, Dict, Set, Tuple
|
||||
|
||||
from tests.verifications.openai_api.fixtures.fixtures import _load_all_verification_configs
|
||||
|
||||
# Define the root directory for test results
|
||||
RESULTS_DIR = Path(__file__).parent / "test_results"
|
||||
|
@ -43,17 +67,12 @@ RESULTS_DIR.mkdir(exist_ok=True)
|
|||
# Maximum number of test result files to keep per provider
|
||||
MAX_RESULTS_PER_PROVIDER = 1
|
||||
|
||||
# Custom order of providers
|
||||
PROVIDER_ORDER = ["together", "fireworks", "groq", "cerebras", "openai"]
|
||||
|
||||
# Dictionary to store providers and their models (will be populated dynamically)
|
||||
PROVIDERS = defaultdict(set)
|
||||
|
||||
# Tests will be dynamically extracted from results
|
||||
ALL_TESTS = set()
|
||||
VERIFICATION_CONFIG = _load_all_verification_configs()
|
||||
|
||||
|
||||
def run_tests(provider):
|
||||
def run_tests(provider, keyword=None):
|
||||
"""Run pytest for a specific provider and save results"""
|
||||
print(f"Running tests for provider: {provider}")
|
||||
|
||||
|
@ -61,20 +80,28 @@ def run_tests(provider):
|
|||
result_file = RESULTS_DIR / f"{provider}_{timestamp}.json"
|
||||
temp_json_file = RESULTS_DIR / f"temp_{provider}_{timestamp}.json"
|
||||
|
||||
# Determine project root directory relative to this script
|
||||
project_root = Path(__file__).parent.parent.parent
|
||||
|
||||
# Run pytest with JSON output
|
||||
cmd = [
|
||||
"python",
|
||||
"-m",
|
||||
"pytest",
|
||||
"tests/verifications/openai/test_chat_completion.py",
|
||||
"tests/verifications/openai_api/test_chat_completion.py",
|
||||
f"--provider={provider}",
|
||||
"-v",
|
||||
"--json-report",
|
||||
f"--json-report-file={temp_json_file}",
|
||||
]
|
||||
|
||||
# Append -k argument if provided
|
||||
if keyword:
|
||||
cmd.extend(["-k", keyword])
|
||||
|
||||
try:
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
# Run subprocess with cwd set to project root
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, cwd=project_root)
|
||||
print(f"Pytest exit code: {result.returncode}")
|
||||
|
||||
# Check if the JSON file was created
|
||||
|
@ -103,18 +130,30 @@ def run_tests(provider):
|
|||
return None
|
||||
|
||||
|
||||
def parse_results(result_file):
|
||||
"""Parse the test results file and extract pass/fail by model and test"""
|
||||
def parse_results(
|
||||
result_file,
|
||||
) -> Tuple[DefaultDict[str, DefaultDict[str, Dict[str, bool]]], DefaultDict[str, Set[str]], Set[str]]:
|
||||
"""Parse a single test results file.
|
||||
|
||||
Returns:
|
||||
Tuple containing:
|
||||
- parsed_results: DefaultDict[provider, DefaultDict[model, Dict[test_name, pass_status]]]
|
||||
- providers_in_file: DefaultDict[provider, Set[model]] found in this file.
|
||||
- tests_in_file: Set[test_name] found in this file.
|
||||
"""
|
||||
if not os.path.exists(result_file):
|
||||
print(f"Results file does not exist: {result_file}")
|
||||
return {}
|
||||
# Return empty defaultdicts/set matching the type hint
|
||||
return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set()
|
||||
|
||||
with open(result_file, "r") as f:
|
||||
results = json.load(f)
|
||||
|
||||
# Initialize results dictionary
|
||||
parsed_results = defaultdict(lambda: defaultdict(dict))
|
||||
provider = os.path.basename(result_file).split("_")[0]
|
||||
# Initialize results dictionary with specific types
|
||||
parsed_results: DefaultDict[str, DefaultDict[str, Dict[str, bool]]] = defaultdict(lambda: defaultdict(dict))
|
||||
providers_in_file: DefaultDict[str, Set[str]] = defaultdict(set)
|
||||
tests_in_file: Set[str] = set()
|
||||
provider: str = os.path.basename(result_file).split("_")[0]
|
||||
|
||||
# Debug: Print summary of test results
|
||||
print(f"Test results summary for {provider}:")
|
||||
|
@ -127,124 +166,72 @@ def parse_results(result_file):
|
|||
# Extract test results
|
||||
if "tests" not in results or not results["tests"]:
|
||||
print(f"No test results found in {result_file}")
|
||||
return parsed_results
|
||||
# Return empty defaultdicts/set matching the type hint
|
||||
return defaultdict(lambda: defaultdict(dict)), defaultdict(set), set()
|
||||
|
||||
# Map for normalizing model names
|
||||
model_name_map = {
|
||||
"Llama-3.3-8B-Instruct": "Llama-3.3-8B-Instruct",
|
||||
"Llama-3.3-70B-Instruct": "Llama-3.3-70B-Instruct",
|
||||
"Llama-3.2-11B-Vision-Instruct": "Llama-3.2-11B-Vision-Instruct",
|
||||
"Llama-4-Scout-17B-16E": "Llama-4-Scout-17B-16E-Instruct",
|
||||
"Llama-4-Scout-17B-16E-Instruct": "Llama-4-Scout-17B-16E-Instruct",
|
||||
"Llama-4-Maverick-17B-128E": "Llama-4-Maverick-17B-128E-Instruct",
|
||||
"Llama-4-Maverick-17B-128E-Instruct": "Llama-4-Maverick-17B-128E-Instruct",
|
||||
"gpt-4o": "gpt-4o",
|
||||
"gpt-4o-mini": "gpt-4o-mini",
|
||||
}
|
||||
|
||||
# Keep track of all models found for this provider
|
||||
provider_models = set()
|
||||
|
||||
# Track all unique test cases for each base test
|
||||
test_case_counts = defaultdict(int)
|
||||
|
||||
# First pass: count the number of cases for each test
|
||||
# Process the tests
|
||||
for test in results["tests"]:
|
||||
test_id = test.get("nodeid", "")
|
||||
|
||||
if "call" in test:
|
||||
test_name = test_id.split("::")[1].split("[")[0]
|
||||
input_output_match = re.search(r"\[input_output(\d+)-", test_id)
|
||||
if input_output_match:
|
||||
test_case_counts[test_name] += 1
|
||||
if not (call_phase := test.get("call")):
|
||||
continue
|
||||
call_outcome = call_phase.get("outcome")
|
||||
if call_outcome not in ("passed", "failed"):
|
||||
continue
|
||||
|
||||
# Second pass: process the tests with case numbers only for tests with multiple cases
|
||||
for test in results["tests"]:
|
||||
test_id = test.get("nodeid", "")
|
||||
outcome = test.get("outcome", "")
|
||||
# --- Extract data from metadata ---
|
||||
metadata = test.get("metadata", {})
|
||||
model = metadata.get("model")
|
||||
case_id = metadata.get("case_id") # String ID (if provided)
|
||||
case_index = metadata.get("case_index") # Integer index (if no ID provided)
|
||||
|
||||
# Only process tests that have been executed (not setup errors)
|
||||
if "call" in test:
|
||||
# Regular test that actually ran
|
||||
test_name = test_id.split("::")[1].split("[")[0]
|
||||
# Check if we have a model and at least one case identifier
|
||||
if not model or (case_id is None and case_index is None):
|
||||
print(
|
||||
f"Warning: Missing 'model' or case identifier ('case_id'/'case_index') metadata for test: {test_id}. Skipping."
|
||||
)
|
||||
continue
|
||||
|
||||
# Extract input_output parameter to differentiate between test cases
|
||||
input_output_match = re.search(r"\[input_output(\d+)-", test_id)
|
||||
input_output_index = input_output_match.group(1) if input_output_match else ""
|
||||
try:
|
||||
test_name_base = test_id.split("::")[1].split("[")[0]
|
||||
except (IndexError, ValueError) as e:
|
||||
print(f"Warning: Could not parse base test name for {test_id}. Error: {e}. Skipping.")
|
||||
continue
|
||||
|
||||
# Create a more detailed test name with case number only if there are multiple cases
|
||||
detailed_test_name = test_name
|
||||
if input_output_index and test_case_counts[test_name] > 1:
|
||||
detailed_test_name = f"{test_name} (case {input_output_index})"
|
||||
# Construct detailed test name using ID or index
|
||||
if case_id is not None:
|
||||
detailed_test_name = f"{test_name_base} ({case_id})"
|
||||
elif case_index == 0:
|
||||
# If case_id is missing and index is 0, assume single case, use base name only
|
||||
detailed_test_name = test_name_base
|
||||
elif case_index is not None: # case_index > 0
|
||||
# Use case_index for naming if case_id wasn't provided and index > 0
|
||||
detailed_test_name = f"{test_name_base} (case{case_index})"
|
||||
else:
|
||||
# This case should be prevented by the earlier check, but handle defensively
|
||||
print(f"Error: No case identifier found for test {test_id} after initial check. Skipping.")
|
||||
continue
|
||||
|
||||
# Track all unique test names
|
||||
ALL_TESTS.add(detailed_test_name)
|
||||
# Populate collections for this file
|
||||
tests_in_file.add(detailed_test_name)
|
||||
providers_in_file[provider].add(model)
|
||||
|
||||
# Extract model name from test_id using a more robust pattern
|
||||
model_match = re.search(r"\[input_output\d+-([^\]]+)\]", test_id)
|
||||
if model_match:
|
||||
raw_model = model_match.group(1)
|
||||
model = model_name_map.get(raw_model, raw_model)
|
||||
if call_outcome == "passed":
|
||||
parsed_results[provider][model][detailed_test_name] = True
|
||||
elif call_outcome == "failed":
|
||||
parsed_results[provider][model][detailed_test_name] = False
|
||||
|
||||
# Add to set of known models for this provider
|
||||
provider_models.add(model)
|
||||
# Final Summary Warning (Optional)
|
||||
if not parsed_results.get(provider):
|
||||
print(f"Warning: No valid test results parsed for provider {provider} from file {result_file}")
|
||||
|
||||
# Also update the global PROVIDERS dictionary
|
||||
PROVIDERS[provider].add(model)
|
||||
|
||||
# Store the result
|
||||
if outcome == "passed":
|
||||
parsed_results[provider][model][detailed_test_name] = True
|
||||
else:
|
||||
parsed_results[provider][model][detailed_test_name] = False
|
||||
|
||||
print(f"Parsed test result: {detailed_test_name} for model {model}: {outcome}")
|
||||
elif outcome == "error" and "setup" in test and test.get("setup", {}).get("outcome") == "failed":
|
||||
# This is a setup failure, which likely means a configuration issue
|
||||
# Extract the base test name and model name
|
||||
parts = test_id.split("::")
|
||||
if len(parts) > 1:
|
||||
test_name = parts[1].split("[")[0]
|
||||
|
||||
# Extract input_output parameter to differentiate between test cases
|
||||
input_output_match = re.search(r"\[input_output(\d+)-", test_id)
|
||||
input_output_index = input_output_match.group(1) if input_output_match else ""
|
||||
|
||||
# Create a more detailed test name with case number only if there are multiple cases
|
||||
detailed_test_name = test_name
|
||||
if input_output_index and test_case_counts[test_name] > 1:
|
||||
detailed_test_name = f"{test_name} (case {input_output_index})"
|
||||
|
||||
if detailed_test_name in ALL_TESTS:
|
||||
# Use a more robust pattern for model extraction
|
||||
model_match = re.search(r"\[input_output\d+-([^\]]+)\]", test_id)
|
||||
if model_match:
|
||||
raw_model = model_match.group(1)
|
||||
model = model_name_map.get(raw_model, raw_model)
|
||||
|
||||
# Add to set of known models for this provider
|
||||
provider_models.add(model)
|
||||
|
||||
# Also update the global PROVIDERS dictionary
|
||||
PROVIDERS[provider].add(model)
|
||||
|
||||
# Mark setup failures as false (failed)
|
||||
parsed_results[provider][model][detailed_test_name] = False
|
||||
print(f"Parsed setup failure: {detailed_test_name} for model {model}")
|
||||
|
||||
# Debug: Print parsed results
|
||||
if not parsed_results[provider]:
|
||||
print(f"Warning: No test results parsed for provider {provider}")
|
||||
else:
|
||||
for model, tests in parsed_results[provider].items():
|
||||
print(f"Model {model}: {len(tests)} test results")
|
||||
|
||||
return parsed_results
|
||||
return parsed_results, providers_in_file, tests_in_file
|
||||
|
||||
|
||||
def cleanup_old_results():
|
||||
"""Clean up old test result files, keeping only the newest N per provider"""
|
||||
for provider in PROVIDERS.keys():
|
||||
def cleanup_old_results(providers_to_clean: Dict[str, Set[str]]):
|
||||
"""Clean up old test result files, keeping only the newest N per provider."""
|
||||
# Use the passed-in providers dictionary
|
||||
for provider in providers_to_clean.keys():
|
||||
# Get all result files for this provider
|
||||
provider_files = list(RESULTS_DIR.glob(f"{provider}_*.json"))
|
||||
|
||||
|
@ -289,8 +276,17 @@ def get_latest_results_by_provider():
|
|||
return provider_results
|
||||
|
||||
|
||||
def generate_report(results_dict, output_file=None):
|
||||
"""Generate the markdown report"""
|
||||
def generate_report(
|
||||
results_dict: Dict[str, Any], providers: Dict[str, Set[str]], all_tests: Set[str], output_file=None
|
||||
):
|
||||
"""Generate the markdown report.
|
||||
|
||||
Args:
|
||||
results_dict: Aggregated results [provider][model][test_name] -> status.
|
||||
providers: Dict of all providers and their models {provider: {models}}.
|
||||
all_tests: Set of all test names found.
|
||||
output_file: Optional path to save the report.
|
||||
"""
|
||||
if output_file is None:
|
||||
# Default to creating the report in the same directory as this script
|
||||
output_file = Path(__file__).parent / "REPORT.md"
|
||||
|
@ -299,8 +295,8 @@ def generate_report(results_dict, output_file=None):
|
|||
|
||||
# Get the timestamp from result files
|
||||
provider_timestamps = {}
|
||||
provider_results = get_latest_results_by_provider()
|
||||
for provider, result_file in provider_results.items():
|
||||
provider_results_files = get_latest_results_by_provider()
|
||||
for provider, result_file in provider_results_files.items():
|
||||
# Extract timestamp from filename (format: provider_timestamp.json)
|
||||
try:
|
||||
timestamp_str = result_file.stem.split("_")[1]
|
||||
|
@ -310,12 +306,33 @@ def generate_report(results_dict, output_file=None):
|
|||
except (IndexError, ValueError):
|
||||
provider_timestamps[provider] = "Unknown"
|
||||
|
||||
# Convert provider model sets to sorted lists
|
||||
for provider in PROVIDERS:
|
||||
PROVIDERS[provider] = sorted(PROVIDERS[provider])
|
||||
# Convert provider model sets to sorted lists (use passed-in providers dict)
|
||||
providers_sorted = {prov: sorted(models) for prov, models in providers.items()}
|
||||
|
||||
# Sort tests alphabetically
|
||||
sorted_tests = sorted(ALL_TESTS)
|
||||
# Sort tests alphabetically (use passed-in all_tests set)
|
||||
sorted_tests = sorted(all_tests)
|
||||
|
||||
# Calculate counts for each base test name
|
||||
base_test_case_counts: DefaultDict[str, int] = defaultdict(int)
|
||||
base_test_name_map: Dict[str, str] = {}
|
||||
for test_name in sorted_tests:
|
||||
match = re.match(r"^(.*?)( \([^)]+\))?$", test_name)
|
||||
if match:
|
||||
base_name = match.group(1).strip()
|
||||
base_test_case_counts[base_name] += 1
|
||||
base_test_name_map[test_name] = base_name
|
||||
else:
|
||||
# Should not happen with current naming, but handle defensively
|
||||
base_test_case_counts[test_name] += 1
|
||||
base_test_name_map[test_name] = test_name
|
||||
|
||||
if not sorted_tests:
|
||||
print("Warning: No test results found to generate a report.")
|
||||
# Optionally create an empty report or return early
|
||||
with open(output_file, "w") as f:
|
||||
f.write("# Test Results Report\n\nNo test results found.\n")
|
||||
print(f"Generated empty report: {output_file}")
|
||||
return
|
||||
|
||||
report = ["# Test Results Report\n"]
|
||||
report.append(f"*Generated on: {time.strftime('%Y-%m-%d %H:%M:%S')}*\n")
|
||||
|
@ -336,19 +353,15 @@ def generate_report(results_dict, output_file=None):
|
|||
# Add a summary section
|
||||
report.append("## Summary\n")
|
||||
|
||||
# Count total tests and passes
|
||||
# Count total tests and passes (use passed-in providers and all_tests)
|
||||
total_tests = 0
|
||||
passed_tests = 0
|
||||
provider_totals = {}
|
||||
|
||||
# Prepare summary data
|
||||
for provider in PROVIDERS.keys():
|
||||
for provider, models in providers_sorted.items():
|
||||
provider_passed = 0
|
||||
provider_total = 0
|
||||
|
||||
if provider in results_dict:
|
||||
provider_models = PROVIDERS[provider]
|
||||
for model in provider_models:
|
||||
for model in models:
|
||||
if model in results_dict[provider]:
|
||||
model_results = results_dict[provider][model]
|
||||
for test in sorted_tests:
|
||||
|
@ -358,33 +371,26 @@ def generate_report(results_dict, output_file=None):
|
|||
if model_results[test]:
|
||||
provider_passed += 1
|
||||
passed_tests += 1
|
||||
|
||||
provider_totals[provider] = (provider_passed, provider_total)
|
||||
|
||||
# Add summary table
|
||||
# Add summary table (use passed-in providers dict)
|
||||
report.append("| Provider | Pass Rate | Tests Passed | Total Tests |")
|
||||
report.append("| --- | --- | --- | --- |")
|
||||
|
||||
# Use the custom order for summary table
|
||||
for provider in [p for p in PROVIDER_ORDER if p in PROVIDERS]:
|
||||
for provider in [p for p in PROVIDER_ORDER if p in providers]: # Check against keys of passed-in dict
|
||||
passed, total = provider_totals.get(provider, (0, 0))
|
||||
pass_rate = f"{(passed / total * 100):.1f}%" if total > 0 else "N/A"
|
||||
report.append(f"| {provider.capitalize()} | {pass_rate} | {passed} | {total} |")
|
||||
|
||||
# Add providers not in the custom order
|
||||
for provider in [p for p in PROVIDERS if p not in PROVIDER_ORDER]:
|
||||
for provider in [p for p in providers if p not in PROVIDER_ORDER]: # Check against keys of passed-in dict
|
||||
passed, total = provider_totals.get(provider, (0, 0))
|
||||
pass_rate = f"{(passed / total * 100):.1f}%" if total > 0 else "N/A"
|
||||
report.append(f"| {provider.capitalize()} | {pass_rate} | {passed} | {total} |")
|
||||
|
||||
report.append("\n")
|
||||
|
||||
# Process each provider in the custom order, then any additional providers
|
||||
for provider in sorted(
|
||||
PROVIDERS.keys(), key=lambda p: (PROVIDER_ORDER.index(p) if p in PROVIDER_ORDER else float("inf"), p)
|
||||
providers_sorted.keys(), key=lambda p: (PROVIDER_ORDER.index(p) if p in PROVIDER_ORDER else float("inf"), p)
|
||||
):
|
||||
if not PROVIDERS[provider]:
|
||||
# Skip providers with no models
|
||||
provider_models = providers_sorted[provider] # Use sorted models
|
||||
if not provider_models:
|
||||
continue
|
||||
|
||||
report.append(f"\n## {provider.capitalize()}\n")
|
||||
|
@ -394,34 +400,70 @@ def generate_report(results_dict, output_file=None):
|
|||
report.append(f"*Tests run on: {provider_timestamps[provider]}*\n")
|
||||
|
||||
# Add test command for reproducing results
|
||||
test_cmd = f"pytest tests/verifications/openai/test_chat_completion.py --provider={provider} -v"
|
||||
report.append(f"```bash\n{test_cmd}\n```\n")
|
||||
test_cmd_all = f"pytest tests/verifications/openai_api/test_chat_completion.py --provider={provider} -v"
|
||||
report.append(f"```bash\n# Run all tests for this provider:\n{test_cmd_all}\n")
|
||||
|
||||
# Get the relevant models for this provider
|
||||
provider_models = PROVIDERS[provider]
|
||||
# Find an example test with a case ID
|
||||
example_base_test_name = None
|
||||
example_case_id = None
|
||||
# Get first test as fallback base, handle empty list
|
||||
first_test_name = sorted_tests[0] if sorted_tests else "unknown_test"
|
||||
|
||||
# Create table header with models as columns
|
||||
header = "| Test | " + " | ".join(provider_models) + " |"
|
||||
match = re.match(r"^(.*?) \((.*?)\)$", first_test_name)
|
||||
if match:
|
||||
example_base_test_name = match.group(1).strip()
|
||||
example_case_id = match.group(2).strip()
|
||||
else:
|
||||
example_base_test_name = first_test_name
|
||||
|
||||
base_name = base_test_name_map.get(test, test) # Get base name
|
||||
case_count = base_test_case_counts.get(base_name, 1) # Get count
|
||||
filter_str = f"{example_base_test_name} and {example_case_id}" if case_count > 1 else example_base_test_name
|
||||
|
||||
test_cmd_specific_case = (
|
||||
f'pytest tests/verifications/openai_api/test_chat_completion.py --provider={provider} -k "{filter_str}"'
|
||||
)
|
||||
report.append(
|
||||
f"# Example: Run only the '{example_case_id}' case of {example_base_test_name}:\n{test_cmd_specific_case}\n```\n"
|
||||
)
|
||||
|
||||
# Get display names (use passed-in providers dict)
|
||||
provider_config = VERIFICATION_CONFIG.get("providers", {}).get(provider, {})
|
||||
display_name_map = provider_config.get("model_display_names", {})
|
||||
|
||||
# Add Model Key Table (use provider_models)
|
||||
report.append(f"\n**Model Key ({provider.capitalize()})**\n")
|
||||
provider_key_lines = ["| Display Name | Full Model ID |", "| --- | --- |"]
|
||||
for model_id in provider_models:
|
||||
display_name = display_name_map.get(model_id, model_id)
|
||||
provider_key_lines.append(f"| {display_name} | `{model_id}` |")
|
||||
report.extend(provider_key_lines)
|
||||
report.append("\n")
|
||||
|
||||
# Create results table header (use provider_models)
|
||||
display_names = [display_name_map.get(m, m) for m in provider_models]
|
||||
header = "| Test | " + " | ".join(display_names) + " |"
|
||||
separator = "| --- | " + " | ".join(["---"] * len(provider_models)) + " |"
|
||||
|
||||
report.append(header)
|
||||
report.append(separator)
|
||||
|
||||
# Get results for this provider
|
||||
provider_results = results_dict.get(provider, {})
|
||||
# Get results for this provider from results_dict
|
||||
provider_results_data = results_dict.get(provider, {})
|
||||
|
||||
# Add rows for each test
|
||||
# Add rows for each test (use sorted_tests)
|
||||
for test in sorted_tests:
|
||||
row = f"| {test} |"
|
||||
# Determine display name based on case count
|
||||
base_name = base_test_name_map.get(test, test) # Get base name
|
||||
case_count = base_test_case_counts.get(base_name, 1) # Get count
|
||||
display_test_name = base_name if case_count == 1 else test # Choose display name
|
||||
row = f"| {display_test_name} |" # Use display name
|
||||
|
||||
# Add results for each model in this test
|
||||
for model in provider_models:
|
||||
if model in provider_results and test in provider_results[model]:
|
||||
result = pass_icon if provider_results[model][test] else fail_icon
|
||||
for model_id in provider_models:
|
||||
if model_id in provider_results_data and test in provider_results_data[model_id]:
|
||||
result = pass_icon if provider_results_data[model_id][test] else fail_icon
|
||||
else:
|
||||
result = na_icon
|
||||
row += f" {result} |"
|
||||
|
||||
report.append(row)
|
||||
|
||||
# Write to file
|
||||
|
@ -442,9 +484,13 @@ def main():
|
|||
help="Specify providers to test (comma-separated or space-separated, default: all)",
|
||||
)
|
||||
parser.add_argument("--output", type=str, help="Output file location (default: tests/verifications/REPORT.md)")
|
||||
parser.add_argument("--k", type=str, help="Keyword expression to filter tests (passed to pytest -k)")
|
||||
args = parser.parse_args()
|
||||
|
||||
all_results = {}
|
||||
# Initialize collections to aggregate results in main
|
||||
aggregated_providers = defaultdict(set)
|
||||
aggregated_tests = set()
|
||||
|
||||
if args.run_tests:
|
||||
# Get list of available providers from command line or use detected providers
|
||||
|
@ -463,22 +509,31 @@ def main():
|
|||
|
||||
for provider in test_providers:
|
||||
provider = provider.strip() # Remove any whitespace
|
||||
result_file = run_tests(provider)
|
||||
result_file = run_tests(provider, keyword=args.k)
|
||||
if result_file:
|
||||
provider_results = parse_results(result_file)
|
||||
all_results.update(provider_results)
|
||||
# Parse and aggregate results
|
||||
parsed_results, providers_in_file, tests_in_file = parse_results(result_file)
|
||||
all_results.update(parsed_results)
|
||||
for prov, models in providers_in_file.items():
|
||||
aggregated_providers[prov].update(models)
|
||||
aggregated_tests.update(tests_in_file)
|
||||
else:
|
||||
# Use existing results
|
||||
provider_result_files = get_latest_results_by_provider()
|
||||
|
||||
for result_file in provider_result_files.values():
|
||||
provider_results = parse_results(result_file)
|
||||
all_results.update(provider_results)
|
||||
# Parse and aggregate results
|
||||
parsed_results, providers_in_file, tests_in_file = parse_results(result_file)
|
||||
all_results.update(parsed_results)
|
||||
for prov, models in providers_in_file.items():
|
||||
aggregated_providers[prov].update(models)
|
||||
aggregated_tests.update(tests_in_file)
|
||||
|
||||
# Generate the report
|
||||
generate_report(all_results, args.output)
|
||||
# Generate the report, passing aggregated data
|
||||
generate_report(all_results, aggregated_providers, aggregated_tests, args.output)
|
||||
|
||||
cleanup_old_results()
|
||||
# Cleanup, passing aggregated providers
|
||||
cleanup_old_results(aggregated_providers)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -1,97 +0,0 @@
|
|||
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the terms described in the LICENSE file in
|
||||
# the root directory of this source tree.
|
||||
|
||||
import os
|
||||
|
||||
import pytest
|
||||
from openai import OpenAI
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def providers_model_mapping():
|
||||
"""
|
||||
Mapping from model names used in test cases to provider's model names.
|
||||
"""
|
||||
return {
|
||||
"fireworks": {
|
||||
"Llama-3.3-70B-Instruct": "accounts/fireworks/models/llama-v3p1-70b-instruct",
|
||||
"Llama-3.2-11B-Vision-Instruct": "accounts/fireworks/models/llama-v3p2-11b-vision-instruct",
|
||||
"Llama-4-Scout-17B-16E-Instruct": "accounts/fireworks/models/llama4-scout-instruct-basic",
|
||||
"Llama-4-Maverick-17B-128E-Instruct": "accounts/fireworks/models/llama4-maverick-instruct-basic",
|
||||
},
|
||||
"together": {
|
||||
"Llama-3.3-70B-Instruct": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
|
||||
"Llama-3.2-11B-Vision-Instruct": "meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",
|
||||
"Llama-4-Scout-17B-16E-Instruct": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
|
||||
"Llama-4-Maverick-17B-128E-Instruct": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
|
||||
},
|
||||
"groq": {
|
||||
"Llama-3.3-70B-Instruct": "llama-3.3-70b-versatile",
|
||||
"Llama-3.2-11B-Vision-Instruct": "llama-3.2-11b-vision-preview",
|
||||
"Llama-4-Scout-17B-16E-Instruct": "llama-4-scout-17b-16e-instruct",
|
||||
"Llama-4-Maverick-17B-128E-Instruct": "llama-4-maverick-17b-128e-instruct",
|
||||
},
|
||||
"cerebras": {
|
||||
"Llama-3.3-70B-Instruct": "llama-3.3-70b",
|
||||
},
|
||||
"openai": {
|
||||
"gpt-4o": "gpt-4o",
|
||||
"gpt-4o-mini": "gpt-4o-mini",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def provider_metadata():
|
||||
return {
|
||||
"fireworks": ("https://api.fireworks.ai/inference/v1", "FIREWORKS_API_KEY"),
|
||||
"together": ("https://api.together.xyz/v1", "TOGETHER_API_KEY"),
|
||||
"groq": ("https://api.groq.com/openai/v1", "GROQ_API_KEY"),
|
||||
"cerebras": ("https://api.cerebras.ai/v1", "CEREBRAS_API_KEY"),
|
||||
"openai": ("https://api.openai.com/v1", "OPENAI_API_KEY"),
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def provider(request, provider_metadata):
|
||||
provider = request.config.getoption("--provider")
|
||||
base_url = request.config.getoption("--base-url")
|
||||
|
||||
if provider and base_url and provider_metadata[provider][0] != base_url:
|
||||
raise ValueError(f"Provider {provider} is not supported for base URL {base_url}")
|
||||
|
||||
if not provider:
|
||||
if not base_url:
|
||||
raise ValueError("Provider and base URL are not provided")
|
||||
for provider, metadata in provider_metadata.items():
|
||||
if metadata[0] == base_url:
|
||||
provider = provider
|
||||
break
|
||||
|
||||
return provider
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def base_url(request, provider, provider_metadata):
|
||||
return request.config.getoption("--base-url") or provider_metadata[provider][0]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def api_key(request, provider, provider_metadata):
|
||||
return request.config.getoption("--api-key") or os.getenv(provider_metadata[provider][1])
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def model_mapping(provider, providers_model_mapping):
|
||||
return providers_model_mapping[provider]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def openai_client(base_url, api_key):
|
||||
return OpenAI(
|
||||
base_url=base_url,
|
||||
api_key=api_key,
|
||||
)
|
|
@ -1,202 +0,0 @@
|
|||
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the terms described in the LICENSE file in
|
||||
# the root directory of this source tree.
|
||||
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
from pydantic import BaseModel
|
||||
|
||||
from tests.verifications.openai.fixtures.load import load_test_cases
|
||||
|
||||
chat_completion_test_cases = load_test_cases("chat_completion")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def correct_model_name(model, provider, providers_model_mapping):
|
||||
"""Return the provider-specific model name based on the generic model name."""
|
||||
mapping = providers_model_mapping[provider]
|
||||
if model not in mapping:
|
||||
pytest.skip(f"Provider {provider} does not support model {model}")
|
||||
return mapping[model]
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_basic"]["test_params"]["model"])
|
||||
@pytest.mark.parametrize(
|
||||
"input_output",
|
||||
chat_completion_test_cases["test_chat_basic"]["test_params"]["input_output"],
|
||||
)
|
||||
def test_chat_non_streaming_basic(openai_client, input_output, correct_model_name):
|
||||
response = openai_client.chat.completions.create(
|
||||
model=correct_model_name,
|
||||
messages=input_output["input"]["messages"],
|
||||
stream=False,
|
||||
)
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
assert input_output["output"].lower() in response.choices[0].message.content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_basic"]["test_params"]["model"])
|
||||
@pytest.mark.parametrize(
|
||||
"input_output",
|
||||
chat_completion_test_cases["test_chat_basic"]["test_params"]["input_output"],
|
||||
)
|
||||
def test_chat_streaming_basic(openai_client, input_output, correct_model_name):
|
||||
response = openai_client.chat.completions.create(
|
||||
model=correct_model_name,
|
||||
messages=input_output["input"]["messages"],
|
||||
stream=True,
|
||||
)
|
||||
content = ""
|
||||
for chunk in response:
|
||||
content += chunk.choices[0].delta.content or ""
|
||||
|
||||
# TODO: add detailed type validation
|
||||
|
||||
assert input_output["output"].lower() in content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_image"]["test_params"]["model"])
|
||||
@pytest.mark.parametrize(
|
||||
"input_output",
|
||||
chat_completion_test_cases["test_chat_image"]["test_params"]["input_output"],
|
||||
)
|
||||
def test_chat_non_streaming_image(openai_client, input_output, correct_model_name):
|
||||
response = openai_client.chat.completions.create(
|
||||
model=correct_model_name,
|
||||
messages=input_output["input"]["messages"],
|
||||
stream=False,
|
||||
)
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
assert input_output["output"].lower() in response.choices[0].message.content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model", chat_completion_test_cases["test_chat_image"]["test_params"]["model"])
|
||||
@pytest.mark.parametrize(
|
||||
"input_output",
|
||||
chat_completion_test_cases["test_chat_image"]["test_params"]["input_output"],
|
||||
)
|
||||
def test_chat_streaming_image(openai_client, input_output, correct_model_name):
|
||||
response = openai_client.chat.completions.create(
|
||||
model=correct_model_name,
|
||||
messages=input_output["input"]["messages"],
|
||||
stream=True,
|
||||
)
|
||||
content = ""
|
||||
for chunk in response:
|
||||
content += chunk.choices[0].delta.content or ""
|
||||
|
||||
# TODO: add detailed type validation
|
||||
|
||||
assert input_output["output"].lower() in content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"model",
|
||||
chat_completion_test_cases["test_chat_structured_output"]["test_params"]["model"],
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
"input_output",
|
||||
chat_completion_test_cases["test_chat_structured_output"]["test_params"]["input_output"],
|
||||
)
|
||||
def test_chat_non_streaming_structured_output(openai_client, input_output, correct_model_name):
|
||||
response = openai_client.chat.completions.create(
|
||||
model=correct_model_name,
|
||||
messages=input_output["input"]["messages"],
|
||||
response_format=input_output["input"]["response_format"],
|
||||
stream=False,
|
||||
)
|
||||
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
maybe_json_content = response.choices[0].message.content
|
||||
|
||||
validate_structured_output(maybe_json_content, input_output["output"])
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"model",
|
||||
chat_completion_test_cases["test_chat_structured_output"]["test_params"]["model"],
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
"input_output",
|
||||
chat_completion_test_cases["test_chat_structured_output"]["test_params"]["input_output"],
|
||||
)
|
||||
def test_chat_streaming_structured_output(openai_client, input_output, correct_model_name):
|
||||
response = openai_client.chat.completions.create(
|
||||
model=correct_model_name,
|
||||
messages=input_output["input"]["messages"],
|
||||
response_format=input_output["input"]["response_format"],
|
||||
stream=True,
|
||||
)
|
||||
maybe_json_content = ""
|
||||
for chunk in response:
|
||||
maybe_json_content += chunk.choices[0].delta.content or ""
|
||||
validate_structured_output(maybe_json_content, input_output["output"])
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"model",
|
||||
chat_completion_test_cases["test_tool_calling"]["test_params"]["model"],
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
"input_output",
|
||||
chat_completion_test_cases["test_tool_calling"]["test_params"]["input_output"],
|
||||
)
|
||||
def test_chat_non_streaming_tool_calling(openai_client, input_output, correct_model_name):
|
||||
response = openai_client.chat.completions.create(
|
||||
model=correct_model_name,
|
||||
messages=input_output["input"]["messages"],
|
||||
tools=input_output["input"]["tools"],
|
||||
stream=False,
|
||||
)
|
||||
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
assert len(response.choices[0].message.tool_calls) > 0
|
||||
assert input_output["output"] == "get_weather_tool_call"
|
||||
assert response.choices[0].message.tool_calls[0].function.name == "get_weather"
|
||||
# TODO: add detailed type validation
|
||||
|
||||
|
||||
def get_structured_output(maybe_json_content: str, schema_name: str) -> Any | None:
|
||||
if schema_name == "valid_calendar_event":
|
||||
|
||||
class CalendarEvent(BaseModel):
|
||||
name: str
|
||||
date: str
|
||||
participants: list[str]
|
||||
|
||||
try:
|
||||
calendar_event = CalendarEvent.model_validate_json(maybe_json_content)
|
||||
return calendar_event
|
||||
except Exception:
|
||||
return None
|
||||
elif schema_name == "valid_math_reasoning":
|
||||
|
||||
class Step(BaseModel):
|
||||
explanation: str
|
||||
output: str
|
||||
|
||||
class MathReasoning(BaseModel):
|
||||
steps: list[Step]
|
||||
final_answer: str
|
||||
|
||||
try:
|
||||
math_reasoning = MathReasoning.model_validate_json(maybe_json_content)
|
||||
return math_reasoning
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def validate_structured_output(maybe_json_content: str, schema_name: str) -> None:
|
||||
structured_output = get_structured_output(maybe_json_content, schema_name)
|
||||
assert structured_output is not None
|
||||
if schema_name == "valid_calendar_event":
|
||||
assert structured_output.name is not None
|
||||
assert structured_output.date is not None
|
||||
assert len(structured_output.participants) == 2
|
||||
elif schema_name == "valid_math_reasoning":
|
||||
assert len(structured_output.final_answer) > 0
|
105
tests/verifications/openai_api/fixtures/fixtures.py
Normal file
105
tests/verifications/openai_api/fixtures/fixtures.py
Normal file
|
@ -0,0 +1,105 @@
|
|||
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the terms described in the LICENSE file in
|
||||
# the root directory of this source tree.
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import yaml
|
||||
from openai import OpenAI
|
||||
|
||||
|
||||
# --- Helper Function to Load Config ---
|
||||
def _load_all_verification_configs():
|
||||
"""Load and aggregate verification configs from the conf/ directory."""
|
||||
# Note: Path is relative to *this* file (fixtures.py)
|
||||
conf_dir = Path(__file__).parent.parent.parent / "conf"
|
||||
if not conf_dir.is_dir():
|
||||
# Use pytest.fail if called during test collection, otherwise raise error
|
||||
# For simplicity here, we'll raise an error, assuming direct calls
|
||||
# are less likely or can handle it.
|
||||
raise FileNotFoundError(f"Verification config directory not found at {conf_dir}")
|
||||
|
||||
all_provider_configs = {}
|
||||
yaml_files = list(conf_dir.glob("*.yaml"))
|
||||
if not yaml_files:
|
||||
raise FileNotFoundError(f"No YAML configuration files found in {conf_dir}")
|
||||
|
||||
for config_path in yaml_files:
|
||||
provider_name = config_path.stem
|
||||
try:
|
||||
with open(config_path, "r") as f:
|
||||
provider_config = yaml.safe_load(f)
|
||||
if provider_config:
|
||||
all_provider_configs[provider_name] = provider_config
|
||||
else:
|
||||
# Log warning if possible, or just skip empty files silently
|
||||
print(f"Warning: Config file {config_path} is empty or invalid.")
|
||||
except Exception as e:
|
||||
raise IOError(f"Error loading config file {config_path}: {e}") from e
|
||||
|
||||
return {"providers": all_provider_configs}
|
||||
|
||||
|
||||
# --- End Helper Function ---
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def verification_config():
|
||||
"""Pytest fixture to provide the loaded verification config."""
|
||||
try:
|
||||
return _load_all_verification_configs()
|
||||
except (FileNotFoundError, IOError) as e:
|
||||
pytest.fail(str(e)) # Fail test collection if config loading fails
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def provider(request, verification_config):
|
||||
provider = request.config.getoption("--provider")
|
||||
base_url = request.config.getoption("--base-url")
|
||||
|
||||
if provider and base_url and verification_config["providers"][provider]["base_url"] != base_url:
|
||||
raise ValueError(f"Provider {provider} is not supported for base URL {base_url}")
|
||||
|
||||
if not provider:
|
||||
if not base_url:
|
||||
raise ValueError("Provider and base URL are not provided")
|
||||
for provider, metadata in verification_config["providers"].items():
|
||||
if metadata["base_url"] == base_url:
|
||||
provider = provider
|
||||
break
|
||||
|
||||
return provider
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def base_url(request, provider, verification_config):
|
||||
return request.config.getoption("--base-url") or verification_config["providers"][provider]["base_url"]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def api_key(request, provider, verification_config):
|
||||
provider_conf = verification_config.get("providers", {}).get(provider, {})
|
||||
api_key_env_var = provider_conf.get("api_key_var")
|
||||
|
||||
key_from_option = request.config.getoption("--api-key")
|
||||
key_from_env = os.getenv(api_key_env_var) if api_key_env_var else None
|
||||
|
||||
final_key = key_from_option or key_from_env
|
||||
return final_key
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def model_mapping(provider, providers_model_mapping):
|
||||
return providers_model_mapping[provider]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def openai_client(base_url, api_key):
|
||||
return OpenAI(
|
||||
base_url=base_url,
|
||||
api_key=api_key,
|
||||
)
|
|
@ -1,31 +1,24 @@
|
|||
test_chat_basic:
|
||||
test_name: test_chat_basic
|
||||
test_params:
|
||||
input_output:
|
||||
- input:
|
||||
case:
|
||||
- case_id: "earth"
|
||||
input:
|
||||
messages:
|
||||
- content: Which planet do humans live on?
|
||||
role: user
|
||||
output: Earth
|
||||
- input:
|
||||
- case_id: "saturn"
|
||||
input:
|
||||
messages:
|
||||
- content: Which planet has rings around it with a name starting with letter
|
||||
S?
|
||||
role: user
|
||||
output: Saturn
|
||||
model:
|
||||
- Llama-3.3-8B-Instruct
|
||||
- Llama-3.3-70B-Instruct
|
||||
- Llama-4-Scout-17B-16E
|
||||
- Llama-4-Scout-17B-16E-Instruct
|
||||
- Llama-4-Maverick-17B-128E
|
||||
- Llama-4-Maverick-17B-128E-Instruct
|
||||
- gpt-4o
|
||||
- gpt-4o-mini
|
||||
test_chat_image:
|
||||
test_name: test_chat_image
|
||||
test_params:
|
||||
input_output:
|
||||
case:
|
||||
- input:
|
||||
messages:
|
||||
- content:
|
||||
|
@ -36,18 +29,12 @@ test_chat_image:
|
|||
type: image_url
|
||||
role: user
|
||||
output: llama
|
||||
model:
|
||||
- Llama-4-Scout-17B-16E
|
||||
- Llama-4-Scout-17B-16E-Instruct
|
||||
- Llama-4-Maverick-17B-128E
|
||||
- Llama-4-Maverick-17B-128E-Instruct
|
||||
- gpt-4o
|
||||
- gpt-4o-mini
|
||||
test_chat_structured_output:
|
||||
test_name: test_chat_structured_output
|
||||
test_params:
|
||||
input_output:
|
||||
- input:
|
||||
case:
|
||||
- case_id: "calendar"
|
||||
input:
|
||||
messages:
|
||||
- content: Extract the event information.
|
||||
role: system
|
||||
|
@ -77,7 +64,8 @@ test_chat_structured_output:
|
|||
type: object
|
||||
type: json_schema
|
||||
output: valid_calendar_event
|
||||
- input:
|
||||
- case_id: "math"
|
||||
input:
|
||||
messages:
|
||||
- content: You are a helpful math tutor. Guide the user through the solution
|
||||
step by step.
|
||||
|
@ -118,19 +106,10 @@ test_chat_structured_output:
|
|||
type: object
|
||||
type: json_schema
|
||||
output: valid_math_reasoning
|
||||
model:
|
||||
- Llama-3.3-8B-Instruct
|
||||
- Llama-3.3-70B-Instruct
|
||||
- Llama-4-Scout-17B-16E
|
||||
- Llama-4-Scout-17B-16E-Instruct
|
||||
- Llama-4-Maverick-17B-128E
|
||||
- Llama-4-Maverick-17B-128E-Instruct
|
||||
- gpt-4o
|
||||
- gpt-4o-mini
|
||||
test_tool_calling:
|
||||
test_name: test_tool_calling
|
||||
test_params:
|
||||
input_output:
|
||||
case:
|
||||
- input:
|
||||
messages:
|
||||
- content: You are a helpful assistant that can use tools to get information.
|
||||
|
@ -152,11 +131,3 @@ test_tool_calling:
|
|||
type: object
|
||||
type: function
|
||||
output: get_weather_tool_call
|
||||
model:
|
||||
- Llama-3.3-70B-Instruct
|
||||
- Llama-4-Scout-17B-16E
|
||||
- Llama-4-Scout-17B-16E-Instruct
|
||||
- Llama-4-Maverick-17B-128E
|
||||
- Llama-4-Maverick-17B-128E-Instruct
|
||||
- gpt-4o
|
||||
- gpt-4o-mini
|
271
tests/verifications/openai_api/test_chat_completion.py
Normal file
271
tests/verifications/openai_api/test_chat_completion.py
Normal file
|
@ -0,0 +1,271 @@
|
|||
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the terms described in the LICENSE file in
|
||||
# the root directory of this source tree.
|
||||
|
||||
import re
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
from pydantic import BaseModel
|
||||
|
||||
from tests.verifications.openai_api.fixtures.fixtures import _load_all_verification_configs
|
||||
from tests.verifications.openai_api.fixtures.load import load_test_cases
|
||||
|
||||
chat_completion_test_cases = load_test_cases("chat_completion")
|
||||
|
||||
|
||||
def case_id_generator(case):
|
||||
"""Generate a test ID from the case's 'case_id' field, or use a default."""
|
||||
case_id = case.get("case_id")
|
||||
if isinstance(case_id, (str, int)):
|
||||
return re.sub(r"\\W|^(?=\\d)", "_", str(case_id))
|
||||
return None
|
||||
|
||||
|
||||
def pytest_generate_tests(metafunc):
|
||||
"""Dynamically parametrize tests based on the selected provider and config."""
|
||||
if "model" in metafunc.fixturenames:
|
||||
provider = metafunc.config.getoption("provider")
|
||||
if not provider:
|
||||
print("Warning: --provider not specified. Skipping model parametrization.")
|
||||
metafunc.parametrize("model", [])
|
||||
return
|
||||
|
||||
try:
|
||||
config_data = _load_all_verification_configs()
|
||||
except (FileNotFoundError, IOError) as e:
|
||||
print(f"ERROR loading verification configs: {e}")
|
||||
config_data = {"providers": {}}
|
||||
|
||||
provider_config = config_data.get("providers", {}).get(provider)
|
||||
if provider_config:
|
||||
models = provider_config.get("models", [])
|
||||
if models:
|
||||
metafunc.parametrize("model", models)
|
||||
else:
|
||||
print(f"Warning: No models found for provider '{provider}' in config.")
|
||||
metafunc.parametrize("model", []) # Parametrize empty if no models found
|
||||
else:
|
||||
print(f"Warning: Provider '{provider}' not found in config. No models parametrized.")
|
||||
metafunc.parametrize("model", []) # Parametrize empty if provider not found
|
||||
|
||||
|
||||
def should_skip_test(verification_config, provider, model, test_name_base):
|
||||
"""Check if a test should be skipped based on config exclusions."""
|
||||
provider_config = verification_config.get("providers", {}).get(provider)
|
||||
if not provider_config:
|
||||
return False # No config for provider, don't skip
|
||||
|
||||
exclusions = provider_config.get("test_exclusions", {}).get(model, [])
|
||||
return test_name_base in exclusions
|
||||
|
||||
|
||||
# Helper to get the base test name from the request object
|
||||
def get_base_test_name(request):
|
||||
return request.node.originalname
|
||||
|
||||
|
||||
# --- Test Functions ---
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"case",
|
||||
chat_completion_test_cases["test_chat_basic"]["test_params"]["case"],
|
||||
ids=case_id_generator,
|
||||
)
|
||||
def test_chat_non_streaming_basic(request, openai_client, model, provider, verification_config, case):
|
||||
test_name_base = get_base_test_name(request)
|
||||
if should_skip_test(verification_config, provider, model, test_name_base):
|
||||
pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
|
||||
|
||||
response = openai_client.chat.completions.create(
|
||||
model=model,
|
||||
messages=case["input"]["messages"],
|
||||
stream=False,
|
||||
)
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
assert case["output"].lower() in response.choices[0].message.content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"case",
|
||||
chat_completion_test_cases["test_chat_basic"]["test_params"]["case"],
|
||||
ids=case_id_generator,
|
||||
)
|
||||
def test_chat_streaming_basic(request, openai_client, model, provider, verification_config, case):
|
||||
test_name_base = get_base_test_name(request)
|
||||
if should_skip_test(verification_config, provider, model, test_name_base):
|
||||
pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
|
||||
|
||||
response = openai_client.chat.completions.create(
|
||||
model=model,
|
||||
messages=case["input"]["messages"],
|
||||
stream=True,
|
||||
)
|
||||
content = ""
|
||||
for chunk in response:
|
||||
content += chunk.choices[0].delta.content or ""
|
||||
|
||||
# TODO: add detailed type validation
|
||||
|
||||
assert case["output"].lower() in content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"case",
|
||||
chat_completion_test_cases["test_chat_image"]["test_params"]["case"],
|
||||
ids=case_id_generator,
|
||||
)
|
||||
def test_chat_non_streaming_image(request, openai_client, model, provider, verification_config, case):
|
||||
test_name_base = get_base_test_name(request)
|
||||
if should_skip_test(verification_config, provider, model, test_name_base):
|
||||
pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
|
||||
|
||||
response = openai_client.chat.completions.create(
|
||||
model=model,
|
||||
messages=case["input"]["messages"],
|
||||
stream=False,
|
||||
)
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
assert case["output"].lower() in response.choices[0].message.content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"case",
|
||||
chat_completion_test_cases["test_chat_image"]["test_params"]["case"],
|
||||
ids=case_id_generator,
|
||||
)
|
||||
def test_chat_streaming_image(request, openai_client, model, provider, verification_config, case):
|
||||
test_name_base = get_base_test_name(request)
|
||||
if should_skip_test(verification_config, provider, model, test_name_base):
|
||||
pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
|
||||
|
||||
response = openai_client.chat.completions.create(
|
||||
model=model,
|
||||
messages=case["input"]["messages"],
|
||||
stream=True,
|
||||
)
|
||||
content = ""
|
||||
for chunk in response:
|
||||
content += chunk.choices[0].delta.content or ""
|
||||
|
||||
# TODO: add detailed type validation
|
||||
|
||||
assert case["output"].lower() in content.lower()
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"case",
|
||||
chat_completion_test_cases["test_chat_structured_output"]["test_params"]["case"],
|
||||
ids=case_id_generator,
|
||||
)
|
||||
def test_chat_non_streaming_structured_output(request, openai_client, model, provider, verification_config, case):
|
||||
test_name_base = get_base_test_name(request)
|
||||
if should_skip_test(verification_config, provider, model, test_name_base):
|
||||
pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
|
||||
|
||||
response = openai_client.chat.completions.create(
|
||||
model=model,
|
||||
messages=case["input"]["messages"],
|
||||
response_format=case["input"]["response_format"],
|
||||
stream=False,
|
||||
)
|
||||
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
maybe_json_content = response.choices[0].message.content
|
||||
|
||||
validate_structured_output(maybe_json_content, case["output"])
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"case",
|
||||
chat_completion_test_cases["test_chat_structured_output"]["test_params"]["case"],
|
||||
ids=case_id_generator,
|
||||
)
|
||||
def test_chat_streaming_structured_output(request, openai_client, model, provider, verification_config, case):
|
||||
test_name_base = get_base_test_name(request)
|
||||
if should_skip_test(verification_config, provider, model, test_name_base):
|
||||
pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
|
||||
|
||||
response = openai_client.chat.completions.create(
|
||||
model=model,
|
||||
messages=case["input"]["messages"],
|
||||
response_format=case["input"]["response_format"],
|
||||
stream=True,
|
||||
)
|
||||
maybe_json_content = ""
|
||||
for chunk in response:
|
||||
maybe_json_content += chunk.choices[0].delta.content or ""
|
||||
validate_structured_output(maybe_json_content, case["output"])
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"case",
|
||||
chat_completion_test_cases["test_tool_calling"]["test_params"]["case"],
|
||||
ids=case_id_generator,
|
||||
)
|
||||
def test_chat_non_streaming_tool_calling(request, openai_client, model, provider, verification_config, case):
|
||||
test_name_base = get_base_test_name(request)
|
||||
if should_skip_test(verification_config, provider, model, test_name_base):
|
||||
pytest.skip(f"Skipping {test_name_base} for model {model} on provider {provider} based on config.")
|
||||
|
||||
response = openai_client.chat.completions.create(
|
||||
model=model,
|
||||
messages=case["input"]["messages"],
|
||||
tools=case["input"]["tools"],
|
||||
stream=False,
|
||||
)
|
||||
|
||||
assert response.choices[0].message.role == "assistant"
|
||||
assert len(response.choices[0].message.tool_calls) > 0
|
||||
assert case["output"] == "get_weather_tool_call"
|
||||
assert response.choices[0].message.tool_calls[0].function.name == "get_weather"
|
||||
# TODO: add detailed type validation
|
||||
|
||||
|
||||
# --- Helper functions (structured output validation) ---
|
||||
|
||||
|
||||
def get_structured_output(maybe_json_content: str, schema_name: str) -> Any | None:
|
||||
if schema_name == "valid_calendar_event":
|
||||
|
||||
class CalendarEvent(BaseModel):
|
||||
name: str
|
||||
date: str
|
||||
participants: list[str]
|
||||
|
||||
try:
|
||||
calendar_event = CalendarEvent.model_validate_json(maybe_json_content)
|
||||
return calendar_event
|
||||
except Exception:
|
||||
return None
|
||||
elif schema_name == "valid_math_reasoning":
|
||||
|
||||
class Step(BaseModel):
|
||||
explanation: str
|
||||
output: str
|
||||
|
||||
class MathReasoning(BaseModel):
|
||||
steps: list[Step]
|
||||
final_answer: str
|
||||
|
||||
try:
|
||||
math_reasoning = MathReasoning.model_validate_json(maybe_json_content)
|
||||
return math_reasoning
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def validate_structured_output(maybe_json_content: str, schema_name: str) -> None:
|
||||
structured_output = get_structured_output(maybe_json_content, schema_name)
|
||||
assert structured_output is not None
|
||||
if schema_name == "valid_calendar_event":
|
||||
assert structured_output.name is not None
|
||||
assert structured_output.date is not None
|
||||
assert len(structured_output.participants) == 2
|
||||
elif schema_name == "valid_math_reasoning":
|
||||
assert len(structured_output.final_answer) > 0
|
File diff suppressed because it is too large
Load diff
1329
tests/verifications/test_results/fireworks_1744264202.json
Normal file
1329
tests/verifications/test_results/fireworks_1744264202.json
Normal file
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
868
tests/verifications/test_results/openai_1744264304.json
Normal file
868
tests/verifications/test_results/openai_1744264304.json
Normal file
|
@ -0,0 +1,868 @@
|
|||
{
|
||||
"created": 1744264338.9923031,
|
||||
"duration": 32.825536012649536,
|
||||
"exitcode": 0,
|
||||
"root": "/Users/erichuang/projects/llama-stack",
|
||||
"environment": {},
|
||||
"summary": {
|
||||
"passed": 22,
|
||||
"total": 22,
|
||||
"collected": 22
|
||||
},
|
||||
"collectors": [
|
||||
{
|
||||
"nodeid": "",
|
||||
"outcome": "passed",
|
||||
"result": [
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py",
|
||||
"type": "Module"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py",
|
||||
"outcome": "passed",
|
||||
"result": [
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]",
|
||||
"type": "Function",
|
||||
"lineno": 72
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]",
|
||||
"type": "Function",
|
||||
"lineno": 72
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]",
|
||||
"type": "Function",
|
||||
"lineno": 72
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]",
|
||||
"type": "Function",
|
||||
"lineno": 72
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]",
|
||||
"type": "Function",
|
||||
"lineno": 91
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]",
|
||||
"type": "Function",
|
||||
"lineno": 91
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]",
|
||||
"type": "Function",
|
||||
"lineno": 91
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]",
|
||||
"type": "Function",
|
||||
"lineno": 91
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]",
|
||||
"type": "Function",
|
||||
"lineno": 115
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]",
|
||||
"type": "Function",
|
||||
"lineno": 115
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]",
|
||||
"type": "Function",
|
||||
"lineno": 134
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]",
|
||||
"type": "Function",
|
||||
"lineno": 134
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]",
|
||||
"type": "Function",
|
||||
"lineno": 158
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]",
|
||||
"type": "Function",
|
||||
"lineno": 158
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]",
|
||||
"type": "Function",
|
||||
"lineno": 158
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]",
|
||||
"type": "Function",
|
||||
"lineno": 158
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]",
|
||||
"type": "Function",
|
||||
"lineno": 181
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]",
|
||||
"type": "Function",
|
||||
"lineno": 181
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]",
|
||||
"type": "Function",
|
||||
"lineno": 181
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]",
|
||||
"type": "Function",
|
||||
"lineno": 181
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]",
|
||||
"type": "Function",
|
||||
"lineno": 203
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]",
|
||||
"type": "Function",
|
||||
"lineno": 203
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"tests": [
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-earth]",
|
||||
"lineno": 72,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_basic[gpt-4o-earth]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-earth",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "earth"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.05381445901002735,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.49848275003023446,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00018287496641278267,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-saturn]",
|
||||
"lineno": 72,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_basic[gpt-4o-saturn]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-saturn",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "saturn"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.007965500000864267,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.9293275829404593,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00018229195848107338,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-earth]",
|
||||
"lineno": 72,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_basic[gpt-4o-mini-earth]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-earth",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "earth"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.00875679193995893,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.5793640419142321,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0005307920509949327,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_basic[gpt-4o-mini-saturn]",
|
||||
"lineno": 72,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_basic[gpt-4o-mini-saturn]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-saturn",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "saturn"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.01076845801435411,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.8752291660057381,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0004834589781239629,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-earth]",
|
||||
"lineno": 91,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_basic[gpt-4o-earth]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-earth",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "earth"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.01662245800253004,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.8336971249664202,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0024086670018732548,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-saturn]",
|
||||
"lineno": 91,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_basic[gpt-4o-saturn]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-saturn",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "saturn"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.009416291955858469,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.43594495789147913,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0009131249971687794,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-earth]",
|
||||
"lineno": 91,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_basic[gpt-4o-mini-earth]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-earth",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "earth"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.013155042077414691,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.6119836670113727,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00023804197553545237,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_basic[gpt-4o-mini-saturn]",
|
||||
"lineno": 91,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_basic[gpt-4o-mini-saturn]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-saturn",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "saturn"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.009004916995763779,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.8327413749648258,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00046841695439070463,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-case0]",
|
||||
"lineno": 115,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_image[gpt-4o-case0]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-case0",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "case0"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.009574208059348166,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 2.221839000005275,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00015945907216519117,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_image[gpt-4o-mini-case0]",
|
||||
"lineno": 115,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_image[gpt-4o-mini-case0]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-case0",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "case0"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.0084402080392465,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 2.298736457945779,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0002423750702291727,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-case0]",
|
||||
"lineno": 134,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_image[gpt-4o-case0]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-case0",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "case0"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.007330416003242135,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 4.062959833070636,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00015470804646611214,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_image[gpt-4o-mini-case0]",
|
||||
"lineno": 134,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_image[gpt-4o-mini-case0]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-case0",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "case0"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.019998832955025136,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 2.609432084020227,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.005618917057290673,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-calendar]",
|
||||
"lineno": 158,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_structured_output[gpt-4o-calendar]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-calendar",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "calendar"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.00867662497330457,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.6856697499752045,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00018445902969688177,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-math]",
|
||||
"lineno": 158,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_structured_output[gpt-4o-math]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-math",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "math"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.01139050000347197,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 2.764390083961189,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0003164170775562525,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]",
|
||||
"lineno": 158,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_structured_output[gpt-4o-mini-calendar]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-calendar",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "calendar"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.01321374997496605,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.8284227909753099,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00030170800164341927,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_structured_output[gpt-4o-mini-math]",
|
||||
"lineno": 158,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_structured_output[gpt-4o-mini-math]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-math",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "math"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.013477458036504686,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 2.4146235829684883,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00025754200760275126,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-calendar]",
|
||||
"lineno": 181,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_structured_output[gpt-4o-calendar]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-calendar",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "calendar"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.006940583931282163,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.5102092920569703,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00023379107005894184,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-math]",
|
||||
"lineno": 181,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_structured_output[gpt-4o-math]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-math",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "math"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.007166999974288046,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 3.5751801669830456,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00015041697770357132,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-calendar]",
|
||||
"lineno": 181,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_structured_output[gpt-4o-mini-calendar]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-calendar",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "calendar"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.010652625001966953,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.6648182499920949,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0008647920330986381,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_streaming_structured_output[gpt-4o-mini-math]",
|
||||
"lineno": 181,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_streaming_structured_output[gpt-4o-mini-math]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-math",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "math"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.007372208056040108,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 2.80747462506406,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.00028124998789280653,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-case0]",
|
||||
"lineno": 203,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_tool_calling[gpt-4o-case0]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-case0",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o",
|
||||
"case_id": "case0"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.01625587500166148,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.6878769160248339,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0002637499710544944,
|
||||
"outcome": "passed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeid": "tests/verifications/openai_api/test_chat_completion.py::test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]",
|
||||
"lineno": 203,
|
||||
"outcome": "passed",
|
||||
"keywords": [
|
||||
"test_chat_non_streaming_tool_calling[gpt-4o-mini-case0]",
|
||||
"parametrize",
|
||||
"pytestmark",
|
||||
"gpt-4o-mini-case0",
|
||||
"test_chat_completion.py",
|
||||
"openai_api",
|
||||
"verifications",
|
||||
"tests",
|
||||
"llama-stack",
|
||||
""
|
||||
],
|
||||
"metadata": {
|
||||
"model": "gpt-4o-mini",
|
||||
"case_id": "case0"
|
||||
},
|
||||
"setup": {
|
||||
"duration": 0.008817250025458634,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"call": {
|
||||
"duration": 0.7181202919455245,
|
||||
"outcome": "passed"
|
||||
},
|
||||
"teardown": {
|
||||
"duration": 0.0017147079342976213,
|
||||
"outcome": "passed"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
File diff suppressed because it is too large
Load diff
1420
tests/verifications/test_results/together_1744264258.json
Normal file
1420
tests/verifications/test_results/together_1744264258.json
Normal file
File diff suppressed because it is too large
Load diff
178
uv.lock
generated
178
uv.lock
generated
|
@ -1,4 +1,5 @@
|
|||
version = 1
|
||||
revision = 1
|
||||
requires-python = ">=3.10"
|
||||
resolution-markers = [
|
||||
"(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
|
||||
|
@ -139,6 +140,22 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/7e/b3/6b4067be973ae96ba0d615946e314c5ae35f9f993eca561b356540bb0c2b/alabaster-1.0.0-py3-none-any.whl", hash = "sha256:fc6786402dc3fcb2de3cabd5fe455a2db534b371124f1f21de8731783dec828b", size = 13929 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "altair"
|
||||
version = "5.5.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "jinja2" },
|
||||
{ name = "jsonschema" },
|
||||
{ name = "narwhals" },
|
||||
{ name = "packaging" },
|
||||
{ name = "typing-extensions", marker = "python_full_version < '3.14'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/16/b1/f2969c7bdb8ad8bbdda031687defdce2c19afba2aa2c8e1d2a17f78376d8/altair-5.5.0.tar.gz", hash = "sha256:d960ebe6178c56de3855a68c47b516be38640b73fb3b5111c2a9ca90546dd73d", size = 705305 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/aa/f3/0b6ced594e51cc95d8c1fc1640d3623770d01e4969d29c0bd09945fafefa/altair-5.5.0-py3-none-any.whl", hash = "sha256:91a310b926508d560fe0148d02a194f38b824122641ef528113d029fcd129f8c", size = 731200 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "annotated-types"
|
||||
version = "0.7.0"
|
||||
|
@ -258,6 +275,15 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/09/71/54e999902aed72baf26bca0d50781b01838251a462612966e9fc4891eadd/black-25.1.0-py3-none-any.whl", hash = "sha256:95e8176dae143ba9097f351d174fdaf0ccd29efb414b362ae3fd72bf0f710717", size = 207646 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "blinker"
|
||||
version = "1.9.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/21/28/9b3f50ce0e048515135495f198351908d99540d69bfdc8c1d15b73dc55ce/blinker-1.9.0.tar.gz", hash = "sha256:b4ce2265a7abece45e7cc896e98dbebe6cead56bcf805a3d23136d145f5445bf", size = 22460 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/10/cb/f2ad4230dc2eb1a74edf38f1a38b9b52277f75bef262d8908e60d957e13c/blinker-1.9.0-py3-none-any.whl", hash = "sha256:ba0efaa9080b619ff2f3459d1d500c57bddea4a6b424b60a91141db6fd2f08bc", size = 8458 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "blobfile"
|
||||
version = "3.0.0"
|
||||
|
@ -282,6 +308,15 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/b3/58/a255894436f3eca4a20611785a30a43b85bc75adf1b77f227e1e6d0cce0a/braintrust_core-0.0.58-py3-none-any.whl", hash = "sha256:fa272b70376d2c6692acf00ebd9fb9bae057b0c53b2b6a59a64850bf79757311", size = 4438 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "cachetools"
|
||||
version = "5.5.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/6c/81/3747dad6b14fa2cf53fcf10548cf5aea6913e96fab41a3c198676f8948a5/cachetools-5.5.2.tar.gz", hash = "sha256:1a661caa9175d26759571b2e19580f9d6393969e5dfca11fdb1f947a23e640d4", size = 28380 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/72/76/20fa66124dbe6be5cafeb312ece67de6b61dd91a0247d1ea13db4ebb33c2/cachetools-5.5.2-py3-none-any.whl", hash = "sha256:d26a22bcc62eb95c3beabd9f1ee5e820d3d2704fe2967cbe350e20c8ffcd3f0a", size = 10080 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "certifi"
|
||||
version = "2025.1.31"
|
||||
|
@ -783,6 +818,30 @@ http = [
|
|||
{ name = "aiohttp" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "gitdb"
|
||||
version = "4.0.12"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "smmap" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/72/94/63b0fc47eb32792c7ba1fe1b694daec9a63620db1e313033d18140c2320a/gitdb-4.0.12.tar.gz", hash = "sha256:5ef71f855d191a3326fcfbc0d5da835f26b13fbcba60c32c21091c349ffdb571", size = 394684 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/a0/61/5c78b91c3143ed5c14207f463aecfc8f9dbb5092fb2869baf37c273b2705/gitdb-4.0.12-py3-none-any.whl", hash = "sha256:67073e15955400952c6565cc3e707c554a4eea2e428946f7a4c162fab9bd9bcf", size = 62794 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "gitpython"
|
||||
version = "3.1.44"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "gitdb" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/c0/89/37df0b71473153574a5cdef8f242de422a0f5d26d7a9e231e6f169b4ad14/gitpython-3.1.44.tar.gz", hash = "sha256:c87e30b26253bf5418b01b0660f818967f3c503193838337fe5e573331249269", size = 214196 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/1d/9a/4114a9057db2f1462d5c8f8390ab7383925fe1ac012eaa42402ad65c2963/GitPython-3.1.44-py3-none-any.whl", hash = "sha256:9e0e10cda9bed1ee64bc9a6de50e7e38a9c9943241cd7f585f6df3ed28011110", size = 207599 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "googleapis-common-protos"
|
||||
version = "1.67.0"
|
||||
|
@ -1386,6 +1445,12 @@ test = [
|
|||
{ name = "torchvision", version = "0.21.0", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "(platform_machine == 'aarch64' and sys_platform == 'linux') or sys_platform == 'darwin'" },
|
||||
{ name = "torchvision", version = "0.21.0+cpu", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "(platform_machine != 'aarch64' and sys_platform == 'linux') or (sys_platform != 'darwin' and sys_platform != 'linux')" },
|
||||
]
|
||||
ui = [
|
||||
{ name = "llama-stack-client" },
|
||||
{ name = "pandas" },
|
||||
{ name = "streamlit" },
|
||||
{ name = "streamlit-option-menu" },
|
||||
]
|
||||
unit = [
|
||||
{ name = "aiohttp" },
|
||||
{ name = "aiosqlite" },
|
||||
|
@ -1416,6 +1481,7 @@ requires-dist = [
|
|||
{ name = "jinja2", marker = "extra == 'codegen'", specifier = ">=3.1.6" },
|
||||
{ name = "jsonschema" },
|
||||
{ name = "llama-stack-client", specifier = ">=0.2.1" },
|
||||
{ name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.1" },
|
||||
{ name = "mcp", marker = "extra == 'test'" },
|
||||
{ name = "myst-parser", marker = "extra == 'docs'" },
|
||||
{ name = "nbval", marker = "extra == 'dev'" },
|
||||
|
@ -1423,6 +1489,7 @@ requires-dist = [
|
|||
{ name = "openai", marker = "extra == 'unit'" },
|
||||
{ name = "opentelemetry-exporter-otlp-proto-http", marker = "extra == 'test'" },
|
||||
{ name = "opentelemetry-sdk", marker = "extra == 'test'" },
|
||||
{ name = "pandas", marker = "extra == 'ui'" },
|
||||
{ name = "pillow" },
|
||||
{ name = "pre-commit", marker = "extra == 'dev'" },
|
||||
{ name = "prompt-toolkit" },
|
||||
|
@ -1452,6 +1519,8 @@ requires-dist = [
|
|||
{ name = "sphinxcontrib-redoc", marker = "extra == 'docs'" },
|
||||
{ name = "sphinxcontrib-video", marker = "extra == 'docs'" },
|
||||
{ name = "sqlite-vec", marker = "extra == 'unit'" },
|
||||
{ name = "streamlit", marker = "extra == 'ui'" },
|
||||
{ name = "streamlit-option-menu", marker = "extra == 'ui'" },
|
||||
{ name = "termcolor" },
|
||||
{ name = "tiktoken" },
|
||||
{ name = "tomli", marker = "extra == 'docs'" },
|
||||
|
@ -1461,6 +1530,7 @@ requires-dist = [
|
|||
{ name = "types-setuptools", marker = "extra == 'dev'" },
|
||||
{ name = "uvicorn", marker = "extra == 'dev'" },
|
||||
]
|
||||
provides-extras = ["dev", "unit", "test", "docs", "codegen", "ui"]
|
||||
|
||||
[[package]]
|
||||
name = "llama-stack-client"
|
||||
|
@ -1815,6 +1885,15 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/5f/df/76d0321c3797b54b60fef9ec3bd6f4cfd124b9e422182156a1dd418722cf/myst_parser-4.0.1-py3-none-any.whl", hash = "sha256:9134e88959ec3b5780aedf8a99680ea242869d012e8821db3126d427edc9c95d", size = 84579 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "narwhals"
|
||||
version = "1.34.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/ec/1d/a21496389436e96394a6e3fb1a644d5bc382250baff76e867f0368a94068/narwhals-1.34.0.tar.gz", hash = "sha256:bdd3fa60bea1f1e8b698e483be18dd43af13290da12dba69ea16dc1f3edbb8f7", size = 265432 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/1e/6d/875d5a7f8e14fc044ede74b94e739d7312c3c8d1a3878f649601b15fdd68/narwhals-1.34.0-py3-none-any.whl", hash = "sha256:9502b9aa5dfe125c090a3a0bbca95becfa1fac2cd67f8b80d12b1dc2ed751865", size = 325346 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "nbformat"
|
||||
version = "5.10.4"
|
||||
|
@ -2571,6 +2650,19 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/0b/53/a64f03044927dc47aafe029c42a5b7aabc38dfb813475e0e1bf71c4a59d0/pydantic_settings-2.8.1-py3-none-any.whl", hash = "sha256:81942d5ac3d905f7f3ee1a70df5dfb62d5569c12f51a5a647defc1c3d9ee2e9c", size = 30839 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pydeck"
|
||||
version = "0.9.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "jinja2" },
|
||||
{ name = "numpy" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/a1/ca/40e14e196864a0f61a92abb14d09b3d3da98f94ccb03b49cf51688140dab/pydeck-0.9.1.tar.gz", hash = "sha256:f74475ae637951d63f2ee58326757f8d4f9cd9f2a457cf42950715003e2cb605", size = 3832240 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/ab/4c/b888e6cf58bd9db9c93f40d1c6be8283ff49d88919231afe93a6bcf61626/pydeck-0.9.1-py2.py3-none-any.whl", hash = "sha256:b3f75ba0d273fc917094fa61224f3f6076ca8752b93d46faf3bcfd9f9d59b038", size = 6900403 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pygments"
|
||||
version = "2.19.1"
|
||||
|
@ -3220,6 +3312,15 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "smmap"
|
||||
version = "5.0.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/44/cd/a040c4b3119bbe532e5b0732286f805445375489fceaec1f48306068ee3b/smmap-5.0.2.tar.gz", hash = "sha256:26ea65a03958fa0c8a1c7e8c7a58fdc77221b8910f6be2131affade476898ad5", size = 22329 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/04/be/d09147ad1ec7934636ad912901c5fd7667e1c858e19d355237db0d0cd5e4/smmap-5.0.2-py3-none-any.whl", hash = "sha256:b30115f0def7d7531d22a0fb6502488d879e75b260a9db4d0819cfb25403af5e", size = 24303 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sniffio"
|
||||
version = "1.3.1"
|
||||
|
@ -3502,6 +3603,47 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/d9/61/f2b52e107b1fc8944b33ef56bf6ac4ebbe16d91b94d2b87ce013bf63fb84/starlette-0.45.3-py3-none-any.whl", hash = "sha256:dfb6d332576f136ec740296c7e8bb8c8a7125044e7c6da30744718880cdd059d", size = 71507 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "streamlit"
|
||||
version = "1.44.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "altair" },
|
||||
{ name = "blinker" },
|
||||
{ name = "cachetools" },
|
||||
{ name = "click" },
|
||||
{ name = "gitpython" },
|
||||
{ name = "numpy" },
|
||||
{ name = "packaging" },
|
||||
{ name = "pandas" },
|
||||
{ name = "pillow" },
|
||||
{ name = "protobuf" },
|
||||
{ name = "pyarrow" },
|
||||
{ name = "pydeck" },
|
||||
{ name = "requests" },
|
||||
{ name = "tenacity" },
|
||||
{ name = "toml" },
|
||||
{ name = "tornado" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "watchdog", marker = "sys_platform != 'darwin'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/3e/c0/7286284567e5045f0c587c426d0c41aee5d10c0a2e360e627a83037e9f0c/streamlit-1.44.1.tar.gz", hash = "sha256:c6914ed6d5b76870b461510476806db370f36425ae0e6654d227c988288198d3", size = 9423685 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/eb/17/fc425e1d4d86e31b2aaf0812a2ef2163763a0670d671720c7c36e8679323/streamlit-1.44.1-py3-none-any.whl", hash = "sha256:9fe355f58b11f4eb71e74f115ce1f38c4c9eaff2733e6bcffb510ac1298a5990", size = 9812242 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "streamlit-option-menu"
|
||||
version = "0.4.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "streamlit" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/5e/27/72dc451cdaef1714fd0d75cc430e50a06c12c9046295fdf1f94af1b766eb/streamlit-option-menu-0.4.0.tar.gz", hash = "sha256:48ec69d59e547fa2fa4bfae001620df8af56a80de2f765ddbb9fcbfb84017129", size = 827290 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/fd/52/2f525ad4262dc83d67297f69ec5afcee1438b9e9ae22aa318396725ddbed/streamlit_option_menu-0.4.0-py3-none-any.whl", hash = "sha256:a55fc7554047b6db371595af2182e435b8a2c715ee6124e8543685bd4670b07e", size = 829255 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sympy"
|
||||
version = "1.13.1"
|
||||
|
@ -3514,6 +3656,15 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/b2/fe/81695a1aa331a842b582453b605175f419fe8540355886031328089d840a/sympy-1.13.1-py3-none-any.whl", hash = "sha256:db36cdc64bf61b9b24578b6f7bab1ecdd2452cf008f34faa33776680c26d66f8", size = 6189177 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tenacity"
|
||||
version = "9.1.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/0a/d4/2b0cd0fe285e14b36db076e78c93766ff1d529d70408bd1d2a5a84f1d929/tenacity-9.1.2.tar.gz", hash = "sha256:1169d376c297e7de388d18b4481760d478b0e99a777cad3a9c86e556f4b697cb", size = 48036 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/e5/30/643397144bfbfec6f6ef821f36f33e57d35946c44a2352d3c9f0ae847619/tenacity-9.1.2-py3-none-any.whl", hash = "sha256:f77bf36710d8b73a50b2dd155c97b870017ad21afe6ab300326b0371b3b05138", size = 28248 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "termcolor"
|
||||
version = "2.5.0"
|
||||
|
@ -3559,6 +3710,15 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/de/a8/8f499c179ec900783ffe133e9aab10044481679bb9aad78436d239eee716/tiktoken-0.9.0-cp313-cp313-win_amd64.whl", hash = "sha256:5ea0edb6f83dc56d794723286215918c1cde03712cbbafa0348b33448faf5b95", size = 894669 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "toml"
|
||||
version = "0.10.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/be/ba/1f744cdc819428fc6b5084ec34d9b30660f6f9daaf70eead706e3203ec3c/toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f", size = 22253 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/44/6f/7120676b6d73228c96e17f1f794d8ab046fc910d781c8d151120c3f1569e/toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b", size = 16588 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tomli"
|
||||
version = "2.2.1"
|
||||
|
@ -3836,6 +3996,24 @@ wheels = [
|
|||
{ url = "https://files.pythonhosted.org/packages/93/fa/849483d56773ae29740ae70043ad88e068f98a6401aa819b5d6bee604683/virtualenv-20.29.2-py3-none-any.whl", hash = "sha256:febddfc3d1ea571bdb1dc0f98d7b45d24def7428214d4fb73cc486c9568cce6a", size = 4301478 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "watchdog"
|
||||
version = "6.0.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/db/7d/7f3d619e951c88ed75c6037b246ddcf2d322812ee8ea189be89511721d54/watchdog-6.0.0.tar.gz", hash = "sha256:9ddf7c82fda3ae8e24decda1338ede66e1c99883db93711d8fb941eaa2d8c282", size = 131220 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/a9/c7/ca4bf3e518cb57a686b2feb4f55a1892fd9a3dd13f470fca14e00f80ea36/watchdog-6.0.0-py3-none-manylinux2014_aarch64.whl", hash = "sha256:7607498efa04a3542ae3e05e64da8202e58159aa1fa4acddf7678d34a35d4f13", size = 79079 },
|
||||
{ url = "https://files.pythonhosted.org/packages/5c/51/d46dc9332f9a647593c947b4b88e2381c8dfc0942d15b8edc0310fa4abb1/watchdog-6.0.0-py3-none-manylinux2014_armv7l.whl", hash = "sha256:9041567ee8953024c83343288ccc458fd0a2d811d6a0fd68c4c22609e3490379", size = 79078 },
|
||||
{ url = "https://files.pythonhosted.org/packages/d4/57/04edbf5e169cd318d5f07b4766fee38e825d64b6913ca157ca32d1a42267/watchdog-6.0.0-py3-none-manylinux2014_i686.whl", hash = "sha256:82dc3e3143c7e38ec49d61af98d6558288c415eac98486a5c581726e0737c00e", size = 79076 },
|
||||
{ url = "https://files.pythonhosted.org/packages/ab/cc/da8422b300e13cb187d2203f20b9253e91058aaf7db65b74142013478e66/watchdog-6.0.0-py3-none-manylinux2014_ppc64.whl", hash = "sha256:212ac9b8bf1161dc91bd09c048048a95ca3a4c4f5e5d4a7d1b1a7d5752a7f96f", size = 79077 },
|
||||
{ url = "https://files.pythonhosted.org/packages/2c/3b/b8964e04ae1a025c44ba8e4291f86e97fac443bca31de8bd98d3263d2fcf/watchdog-6.0.0-py3-none-manylinux2014_ppc64le.whl", hash = "sha256:e3df4cbb9a450c6d49318f6d14f4bbc80d763fa587ba46ec86f99f9e6876bb26", size = 79078 },
|
||||
{ url = "https://files.pythonhosted.org/packages/62/ae/a696eb424bedff7407801c257d4b1afda455fe40821a2be430e173660e81/watchdog-6.0.0-py3-none-manylinux2014_s390x.whl", hash = "sha256:2cce7cfc2008eb51feb6aab51251fd79b85d9894e98ba847408f662b3395ca3c", size = 79077 },
|
||||
{ url = "https://files.pythonhosted.org/packages/b5/e8/dbf020b4d98251a9860752a094d09a65e1b436ad181faf929983f697048f/watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl", hash = "sha256:20ffe5b202af80ab4266dcd3e91aae72bf2da48c0d33bdb15c66658e685e94e2", size = 79078 },
|
||||
{ url = "https://files.pythonhosted.org/packages/07/f6/d0e5b343768e8bcb4cda79f0f2f55051bf26177ecd5651f84c07567461cf/watchdog-6.0.0-py3-none-win32.whl", hash = "sha256:07df1fdd701c5d4c8e55ef6cf55b8f0120fe1aef7ef39a1c6fc6bc2e606d517a", size = 79065 },
|
||||
{ url = "https://files.pythonhosted.org/packages/db/d9/c495884c6e548fce18a8f40568ff120bc3a4b7b99813081c8ac0c936fa64/watchdog-6.0.0-py3-none-win_amd64.whl", hash = "sha256:cbafb470cf848d93b5d013e2ecb245d4aa1c8fd0504e863ccefa32445359d680", size = 79070 },
|
||||
{ url = "https://files.pythonhosted.org/packages/33/e8/e40370e6d74ddba47f002a32919d91310d6074130fe4e17dabcafc15cbf1/watchdog-6.0.0-py3-none-win_ia64.whl", hash = "sha256:a1914259fa9e1454315171103c6a30961236f508b9b623eae470268bbcc6a22f", size = 79067 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "watchfiles"
|
||||
version = "1.0.4"
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue