[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330)

* wip scoring refactor * llm as judge, move folders * test full generation + eval * extract score regex to llm context * remove prints, cleanup braintrust in this branch * change json -> class * remove initialize * address nits * check identifier prefix * udpate MANIFEST
2025-06-27 18:50:41 +00:00 · 2024-10-28 14:08:42 -07:00 · 2024-10-28 14:08:42 -07:00 · 7b8748c53e
commit 7b8748c53e
parent 04a4784287
20 changed files with 360 additions and 50 deletions
--- a/llama_stack/apis/scoring_functions/scoring_functions.py
+++ b/llama_stack/apis/scoring_functions/scoring_functions.py
@ -26,6 +26,10 @@ class Parameter(BaseModel):
 class LLMAsJudgeContext(BaseModel):
    judge_model: str
    prompt_template: Optional[str] = None
+    judge_score_regex: Optional[List[str]] = Field(
+        description="Regex to extract the score from the judge response",
+        default=None,
+    )


@json_schema_type