[torchtune integration] post training + eval (#670)

## What does this PR do?

- Add related Apis in experimental-post-training template to enable eval
on the finetuned checkpoint in the template
- A small bug fix on meta reference eval
- A small error handle improvement on post training 


## Test Plan
From client side issued an E2E post training request
https://github.com/meta-llama/llama-stack-client-python/pull/70 and get
eval results successfully

<img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM"
src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a"
/>
This commit is contained in:
Botao Chen 2024-12-20 13:43:13 -08:00 committed by GitHub
parent c8be0bf1c9
commit 06cb0c837e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 52 additions and 3 deletions

View file

@ -4,10 +4,22 @@ distribution_spec:
description: Experimental template for post training
docker_image: null
providers:
inference:
- inline::meta-reference
eval:
- inline::meta-reference
scoring:
- inline::basic
post_training:
- inline::torchtune
datasetio:
- remote::huggingface
telemetry:
- inline::meta-reference
agents:
- inline::meta-reference
safety:
- inline::llama-guard
memory:
- inline::faiss
image_type: conda