llama-stack-mirror/llama_stack/providers/inline
Ihar Hrachyshka c1f7d7f005
fix: miscellaneous job management improvements in torchtune (#1136)
- **refactor: simplify job status extraction a bit**
- **torchtune: save job status on schedule**
- **refactor: get rid of job_list in torchtune job management code**

# What does this PR do?

A failed job is now registered in API, and one can consult its status.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
$ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73                                                      
JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688))
```

[//]: # (## Documentation)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 19:09:37 -08:00
..
agents feat: log start, complete time to Agent steps (#1116) 2025-02-14 17:48:06 -08:00
datasetio build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00
eval build: configure ruff from pyproject.toml (#1100) 2025-02-14 09:01:57 -08:00
inference chore: remove llama_models.llama3.api imports from providers (#1107) 2025-02-19 19:01:29 -08:00
ios/inference LocalInferenceImpl update for LS 0.1 (#911) 2025-02-02 09:49:40 -08:00
post_training fix: miscellaneous job management improvements in torchtune (#1136) 2025-02-19 19:09:37 -08:00
safety chore: move all Llama Stack types from llama-models to llama-stack (#1098) 2025-02-14 09:10:59 -08:00
scoring build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00
telemetry build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00
tool_runtime fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks (#1123) 2025-02-19 18:39:20 -08:00
vector_io feat: Chunk sqlite-vec writes (#1094) 2025-02-19 19:07:46 -08:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00