llama-stack-mirror/tests/integration/recordings/responses/f6857bcea729.json
Charlie Doern ddefd158e9 test: add integration tests for query_metrics
the integration tests check if all of the metrics we currently support actual are queryable after inference requests

this also tests thinks like aggregation, label filtering, etc

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-08-22 15:54:33 -04:00

39 lines
2.6 KiB
JSON

{
"request": {
"method": "POST",
"url": "http://localhost:11434/api/generate",
"headers": {},
"body": {
"model": "llama3.2:3b",
"raw": true,
"prompt": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nTest metrics generation 2<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"options": {
"temperature": 0.0
},
"stream": false
},
"endpoint": "/api/generate",
"model": "llama3.2:3b"
},
"response": {
"body": {
"__type__": "ollama._types.GenerateResponse",
"__data__": {
"model": "llama3.2:3b",
"created_at": "2025-08-11T15:56:13.082679Z",
"done": true,
"done_reason": "stop",
"total_duration": 2606245291,
"load_duration": 9979708,
"prompt_eval_count": 21,
"prompt_eval_duration": 23000000,
"eval_count": 321,
"eval_duration": 2572000000,
"response": "Here are some test metrics that can be used to evaluate the performance of a system:\n\n1. **Accuracy**: Measures how close the predicted values are to the actual values.\n2. **Precision**: Measures the proportion of true positives among all positive predictions made by the model.\n3. **Recall**: Measures the proportion of true positives among all actual positive instances.\n4. **F1-score**: The harmonic mean of precision and recall, providing a balanced measure of both.\n5. **Mean Squared Error (MSE)**: Measures the average squared difference between predicted and actual values.\n6. **Mean Absolute Error (MAE)**: Measures the average absolute difference between predicted and actual values.\n7. **Root Mean Squared Percentage Error (RMSPE)**: A variation of MSE that expresses errors as a percentage of the actual value.\n8. **Coefficient of Determination (R-squared, R2)**: Measures how well the model explains the variance in the data.\n9. **Mean Absolute Percentage Error (MAPE)**: Measures the average absolute percentage difference between predicted and actual values.\n10. **Mean Squared Logarithmic Error (MSLE)**: A variation of MSE that is more suitable for skewed distributions.\n\nThese metrics can be used to evaluate different aspects of a system's performance, such as:\n\n* Classification models: accuracy, precision, recall, F1-score\n* Regression models: MSE, MAE, RMSPE, R2\n* Time series forecasting: MAPE, MSLE\n\nNote that the choice of metric depends on the specific problem and data.",
"thinking": null,
"context": null
}
},
"is_streaming": false
}
}