llama-stack/llama_stack/providers/inline/post_training
Botao Chen 20383bfea5
[3/n][torchtune integration] add validation logic (#600)
## What does this PR do?
- add validation logic in SFT recipe (validation loss and perplexity)
- add progress bar in both training and validation to better track the
progress on server side (eval has the similar logic)


## Test Plan
validation logic shows up in the Checkpoint training_metric part  
<img width="799" alt="Screenshot 2024-12-12 at 3 21 52 PM"
src="https://github.com/user-attachments/assets/36330ffe-0555-4b2d-93f0-9487dfdf7b4e"
/>

progress bar shows up as 
<img width="476" alt="Screenshot 2024-12-12 at 3 38 11 PM"
src="https://github.com/user-attachments/assets/77306fa2-cb9c-460f-8efc-b41bbe424a7d"
/>
expected
2024-12-13 16:35:06 -08:00
..
torchtune [3/n][torchtune integration] add validation logic (#600) 2024-12-13 16:35:06 -08:00