Commit graph

1 commit

Author SHA1 Message Date
Charlie Doern
46c5b14a22 feat: handle graceful shutdown
currently this impl hangs because of `trainer.train()` blocking.

Re-write the implementation to kick off the model download, device instantiation, dataset processing, and training in a monitored subprocess.

All of these steps need to be in a subprocess or else different devices are used which causes torch errors.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-05-16 16:41:24 -04:00