llama-stack/llama_stack/providers/inline/post_training/torchtune
Botao Chen 06cb0c837e
[torchtune integration] post training + eval (#670)
## What does this PR do?

- Add related Apis in experimental-post-training template to enable eval
on the finetuned checkpoint in the template
- A small bug fix on meta reference eval
- A small error handle improvement on post training 


## Test Plan
From client side issued an E2E post training request
https://github.com/meta-llama/llama-stack-client-python/pull/70 and get
eval results successfully

<img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM"
src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a"
/>
2024-12-20 13:43:13 -08:00
..
common [2/n][torchtune integration] implement job management and return training artifacts (#593) 2024-12-13 15:00:04 -08:00
datasets [1/n] torchtune <> llama-stack integration skeleton (#540) 2024-12-13 11:05:35 -08:00
recipes [torchtune integration] post training + eval (#670) 2024-12-20 13:43:13 -08:00
__init__.py [1/n] torchtune <> llama-stack integration skeleton (#540) 2024-12-13 11:05:35 -08:00
config.py [1/n] torchtune <> llama-stack integration skeleton (#540) 2024-12-13 11:05:35 -08:00
post_training.py [2/n][torchtune integration] implement job management and return training artifacts (#593) 2024-12-13 15:00:04 -08:00