[torchtune integration] post training + eval (#670)

## What does this PR do? - Add related Apis in experimental-post-training template to enable eval on the finetuned checkpoint in the template - A small bug fix on meta reference eval - A small error handle improvement on post training ## Test Plan From client side issued an E2E post training request https://github.com/meta-llama/llama-stack-client-python/pull/70 and get eval results successfully <img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM" src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a" />
2024-12-20 13:43:13 -08:00 · 2024-12-20 13:43:13 -08:00 · 06cb0c837e
commit 06cb0c837e
parent c8be0bf1c9
4 changed files with 52 additions and 3 deletions
--- a/llama_stack/templates/experimental-post-training/build.yaml
+++ b/llama_stack/templates/experimental-post-training/build.yaml
@ -4,10 +4,22 @@ distribution_spec:
  description: Experimental template for post training
  docker_image: null
  providers:
+    inference:
+    - inline::meta-reference
+    eval:
+    - inline::meta-reference
+    scoring:
+    - inline::basic
    post_training:
    - inline::torchtune
    datasetio:
    - remote::huggingface
    telemetry:
    - inline::meta-reference
+    agents:
+    - inline::meta-reference
+    safety:
+    - inline::llama-guard
+    memory:
+    - inline::faiss
 image_type: conda