llama-stack-mirror/llama_stack/providers/inline/post_training
Charlie Doern 65b4fae51d
fix: proper checkpointing logic for HF trainer (#2429)
# What does this PR do?

currently only the last saved model is reported as a checkpoint and
associated with the job UUID. since the HF trainer handles checkpoint
collection during training, we need to add all of the `checkpoint-*`
folders as Checkpoint objects. Adjust the save strategy to be per-epoch
to make this easier and to use less storage

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-27 17:36:25 -04:00
..
common feat: add huggingface post_training impl (#2132) 2025-05-16 14:41:28 -07:00
huggingface fix: proper checkpointing logic for HF trainer (#2429) 2025-06-27 17:36:25 -04:00
torchtune feat: drop python 3.10 support (#2469) 2025-06-19 12:07:14 +05:30
__init__.py Add init files to post training folders (#711) 2025-01-13 20:19:18 -08:00