llama-stack/llama_stack/providers
Ihar Hrachyshka c1f7d7f005
fix: miscellaneous job management improvements in torchtune (#1136)
- **refactor: simplify job status extraction a bit**
- **torchtune: save job status on schedule**
- **refactor: get rid of job_list in torchtune job management code**

# What does this PR do?

A failed job is now registered in API, and one can consult its status.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
$ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73                                                      
JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688))
```

[//]: # (## Documentation)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 19:09:37 -08:00
..
inline fix: miscellaneous job management improvements in torchtune (#1136) 2025-02-19 19:09:37 -08:00
registry fix: Update VectorIO config classes in registry (#1079) 2025-02-13 15:39:13 -08:00
remote chore: remove llama_models.llama3.api imports from providers (#1107) 2025-02-19 19:01:29 -08:00
tests feat: Chunk sqlite-vec writes (#1094) 2025-02-19 19:07:46 -08:00
utils chore: remove llama_models.llama3.api imports from providers (#1107) 2025-02-19 19:01:29 -08:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
datatypes.py chore: move all Llama Stack types from llama-models to llama-stack (#1098) 2025-02-14 09:10:59 -08:00