[/scoring] add ability to define aggregation functions for scoring functions & refactors (#597)

# What does this PR do? - Add ability to define aggregation functions for scoring functions via `ScoringFnParams` - Supported by `basic` / `regex_parser` / `llm_as_judge` scoring functions ## Test Plan ``` pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py ``` <img width="855" alt="image" src="https://github.com/user-attachments/assets/12db8e6e-2ad4-462e-b9b9-70ba6c050a6c"> ``` pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py ``` <img width="858" alt="image" src="https://github.com/user-attachments/assets/bf806676-6f5e-456d-be9f-f81a26d1df19"> **Example Response** (`basic`) <img width="863" alt="image" src="https://github.com/user-attachments/assets/0e57a49c-8386-45cc-8fa9-3e61aaa9a3be"> **Example Response** (`llm-as-judge`) <img width="854" alt="image" src="https://github.com/user-attachments/assets/38065bc2-b724-47ed-9535-79b6099c4362"> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.
2024-12-11 10:03:42 -08:00 · 2024-12-11 10:03:42 -08:00 · a4bcfb8bba
commit a4bcfb8bba
parent e128f2547a
16 changed files with 323 additions and 55 deletions
--- a/llama_stack/providers/inline/scoring/braintrust/braintrust.py
+++ b/llama_stack/providers/inline/scoring/braintrust/braintrust.py
@ -147,7 +147,7 @@ class BraintrustScoringImpl(
                await self.score_row(input_row, scoring_fn_id)
                for input_row in input_rows
            ]
-
+            aggregation_functions = [AggregationFunctionType.average]
            agg_results = aggregate_average(score_results)
            res[scoring_fn_id] = ScoringResult(
                score_rows=score_results,