feat(eval api): (2.2/n) delete eval / scoring / scoring_fn apis (#1700)

# What does this PR do? - To make it easier, delete existing `eval/scoring/scoring_function` apis. There will be a bunch of broken impls here. The sequence is: 1. migrate benchmark graders 2. clean up existing scoring functions - Add a skeleton evaluation impl to make tests pass. ## Test Plan tested in following PRs [//]: # (## Documentation)
2025-03-19 11:04:23 -07:00 · 2025-03-19 11:04:23 -07:00 · c1d18283d2
commit c1d18283d2
parent 0048274ec0
113 changed files with 408 additions and 3900 deletions
--- a/pyproject.toml
+++ b/pyproject.toml
@ -168,7 +168,6 @@ exclude = [
    "^llama_stack/apis/common/training_types\\.py$",
    "^llama_stack/apis/datasetio/datasetio\\.py$",
    "^llama_stack/apis/datasets/datasets\\.py$",
-    "^llama_stack/apis/eval/eval\\.py$",
    "^llama_stack/apis/files/files\\.py$",
    "^llama_stack/apis/inference/inference\\.py$",
    "^llama_stack/apis/inspect/inspect\\.py$",
@ -177,8 +176,6 @@ exclude = [
    "^llama_stack/apis/providers/providers\\.py$",
    "^llama_stack/apis/resource\\.py$",
    "^llama_stack/apis/safety/safety\\.py$",
-    "^llama_stack/apis/scoring/scoring\\.py$",
-    "^llama_stack/apis/scoring_functions/scoring_functions\\.py$",
    "^llama_stack/apis/shields/shields\\.py$",
    "^llama_stack/apis/synthetic_data_generation/synthetic_data_generation\\.py$",
    "^llama_stack/apis/telemetry/telemetry\\.py$",
@ -218,6 +215,7 @@ exclude = [
    "^llama_stack/providers/inline/agents/meta_reference/agent_instance\\.py$",
    "^llama_stack/providers/inline/agents/meta_reference/agents\\.py$",
    "^llama_stack/providers/inline/agents/meta_reference/safety\\.py$",
+    "^llama_stack/providers/inline/evaluation/meta_reference/evaluation\\.py$",
    "^llama_stack/providers/inline/datasetio/localfs/",
    "^llama_stack/providers/inline/eval/meta_reference/eval\\.py$",
    "^llama_stack/providers/inline/inference/meta_reference/config\\.py$",