llama-stack/llama_stack
Botao Chen 20383bfea5
[3/n][torchtune integration] add validation logic (#600)
## What does this PR do?
- add validation logic in SFT recipe (validation loss and perplexity)
- add progress bar in both training and validation to better track the
progress on server side (eval has the similar logic)


## Test Plan
validation logic shows up in the Checkpoint training_metric part  
<img width="799" alt="Screenshot 2024-12-12 at 3 21 52 PM"
src="https://github.com/user-attachments/assets/36330ffe-0555-4b2d-93f0-9487dfdf7b4e"
/>

progress bar shows up as 
<img width="476" alt="Screenshot 2024-12-12 at 3 38 11 PM"
src="https://github.com/user-attachments/assets/77306fa2-cb9c-460f-8efc-b41bbe424a7d"
/>
expected
2024-12-13 16:35:06 -08:00
..
apis [2/n][torchtune integration] implement job management and return training artifacts (#593) 2024-12-13 15:00:04 -08:00
cli doc: llama-stack build --config help text references old directory (#596) 2024-12-10 17:42:02 -08:00
distribution add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
providers [3/n][torchtune integration] add validation logic (#600) 2024-12-13 16:35:06 -08:00
scripts Integrate distro docs into the restructured docs 2024-11-20 23:20:05 -08:00
templates add embedding model by default to distribution templates (#617) 2024-12-13 12:48:00 -08:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00