llama-stack-mirror/llama_stack/providers
Charlie Doern 65b4fae51d
fix: proper checkpointing logic for HF trainer (#2429)
# What does this PR do?

currently only the last saved model is reported as a checkpoint and
associated with the job UUID. since the HF trainer handles checkpoint
collection during training, we need to add all of the `checkpoint-*`
folders as Checkpoint objects. Adjust the save strategy to be per-epoch
to make this easier and to use less storage

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-06-27 17:36:25 -04:00
..
inline fix: proper checkpointing logic for HF trainer (#2429) 2025-06-27 17:36:25 -04:00
registry chore: isolate bare minimum project dependencies (#2282) 2025-06-26 10:14:27 +02:00
remote chore: standardize unsupported model error #2517 (#2518) 2025-06-27 14:26:58 -04:00
utils fix: ValueError in faiss vector database serialization (resolves #2519) (#2526) 2025-06-27 14:34:52 -04:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
datatypes.py fix: finish conversion to StrEnum (#2514) 2025-06-26 08:01:26 +05:30