[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330)

* wip scoring refactor * llm as judge, move folders * test full generation + eval * extract score regex to llm context * remove prints, cleanup braintrust in this branch * change json -> class * remove initialize * address nits * check identifier prefix * udpate MANIFEST
2024-10-28 14:08:42 -07:00 · 2024-10-28 14:08:42 -07:00 · 7b8748c53e
commit 7b8748c53e
parent 04a4784287
20 changed files with 360 additions and 50 deletions
--- a/llama_stack/providers/tests/datasetio/test_datasetio.py
+++ b/llama_stack/providers/tests/datasetio/test_datasetio.py
@ -70,6 +70,7 @@ async def register_dataset(
    if for_generation:
        dataset_schema = {
            "expected_answer": StringType(),
+            "input_query": StringType(),
            "chat_completion_input": ChatCompletionInputType(),
        }
    else: