llama-stack-mirror/llama_stack/providers/inline/post_training/huggingface/recipes
Charlie Doern d6228bb90e fix: proper checkpointing logic for HF trainer
currently only the last saved model is reported as a checkpoint and associated with the job UUID. since the HF trainer handles checkpoint collection during training, we need to add all of the `checkpoint-*` folders as Checkpoint objects. Adjust the save strategy to be per-epoch to make this easier and to use less storage

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-25 20:01:36 -04:00
..
__init__.py ci: add python package build test (#2457) 2025-06-19 18:57:32 +05:30
finetune_single_device.py fix: proper checkpointing logic for HF trainer 2025-06-25 20:01:36 -04:00