llama-stack-mirror/llama_stack/providers/inline/post_training/common
Charlie Doern 46c5b14a22 feat: handle graceful shutdown
currently this impl hangs because of `trainer.train()` blocking.

Re-write the implementation to kick off the model download, device instantiation, dataset processing, and training in a monitored subprocess.

All of these steps need to be in a subprocess or else different devices are used which causes torch errors.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-05-16 16:41:24 -04:00
..
__init__.py [post training] define llama stack post training dataset format (#717) 2025-01-14 12:48:49 -08:00
utils.py feat: handle graceful shutdown 2025-05-16 16:41:24 -04:00
validator.py fix: add todo for schema validation (#1991) 2025-04-29 09:59:35 +02:00