llama-stack-mirror/llama_stack/providers/inline/post_training
Charlie Doern 46c5b14a22 feat: handle graceful shutdown
currently this impl hangs because of `trainer.train()` blocking.

Re-write the implementation to kick off the model download, device instantiation, dataset processing, and training in a monitored subprocess.

All of these steps need to be in a subprocess or else different devices are used which causes torch errors.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-05-16 16:41:24 -04:00
..
common feat: handle graceful shutdown 2025-05-16 16:41:24 -04:00
huggingface feat: handle graceful shutdown 2025-05-16 16:41:24 -04:00
torchtune feat: handle graceful shutdown 2025-05-16 16:41:24 -04:00
__init__.py Add init files to post training folders (#711) 2025-01-13 20:19:18 -08:00