llama-stack-mirror/llama_stack/providers/inline/scoring
Botao Chen b751f7003d
feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164)
as title, to let scoring function llm_as_judge_405b_simpleqa output
aggregated_results.

We can leverage categorical_count to calculate the % of correctness as
eval benchmark metrics
2025-02-19 19:42:04 -08:00
..
basic build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00
braintrust build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00
llm_as_judge feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164) 2025-02-19 19:42:04 -08:00
__init__.py add missing __init__ 2024-12-03 18:50:18 -08:00