mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-07 18:57:21 +00:00
# What does this PR do? currently only the last saved model is reported as a checkpoint and associated with the job UUID. since the HF trainer handles checkpoint collection during training, we need to add all of the `checkpoint-*` folders as Checkpoint objects. Adjust the save strategy to be per-epoch to make this easier and to use less storage Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|---|---|---|
| .. | ||
| agents | ||
| datasetio | ||
| eval | ||
| files/localfs | ||
| inference | ||
| ios/inference | ||
| post_training | ||
| safety | ||
| scoring | ||
| telemetry | ||
| tool_runtime | ||
| vector_io | ||
| __init__.py | ||