llama-stack

History

Botao Chen b751f7003d feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164 ) as title, to let scoring function llm_as_judge_405b_simpleqa output aggregated_results. We can leverage categorical_count to calculate the % of correctness as eval benchmark metrics		2025-02-19 19:42:04 -08:00
..
basic	build: format codebase imports using ruff linter (#1028 )	2025-02-13 10:06:21 -08:00
braintrust	build: format codebase imports using ruff linter (#1028 )	2025-02-13 10:06:21 -08:00
llm_as_judge	feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164 )	2025-02-19 19:42:04 -08:00
__init__.py	add missing __init__	2024-12-03 18:50:18 -08:00