llama-stack-mirror/llama_stack
Ihar Hrachyshka 2433ef218d feat: implement async job scheduler for torchtune
Now a separate thread is started to execute training jobs. Training
requests now return job ID before the job completes. (Which fixes API
timeouts for any jobs that take longer than a minute.)

Note: the scheduler code is meant to be spun out in the future into a
common provider service that can be reused for different APIs and
providers. It is also expected to back the /jobs API proposed here:

https://github.com/meta-llama/llama-stack/discussions/1238

Hence its somewhat generalized form which is expected to simplify its
adoption elsewhere in the future.

Note: this patch doesn't attempt to implement missing APIs (e.g. cancel
or job removal). This work will belong to follow-up PRs.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-28 12:11:59 -04:00
..
apis feat(api): don't return a payload on file delete (#1640) 2025-03-25 17:12:36 -07:00
cli fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555) 2025-03-27 17:13:22 -04:00
distribution fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555) 2025-03-27 17:13:22 -04:00
models/llama feat: Support "stop" parameter in remote:vLLM (#1715) 2025-03-24 12:42:55 -07:00
providers feat: implement async job scheduler for torchtune 2025-03-28 12:11:59 -04:00
strong_typing fix: Support types.UnionType in schemas (#1721) 2025-03-20 09:54:02 -07:00
templates docs: fix remote-vllm instructions (#1805) 2025-03-27 10:19:51 -04:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore: Remove style tags from log formatter (#1808) 2025-03-27 10:18:21 -04:00
schema_utils.py chore: make mypy happy with webmethod (#1758) 2025-03-22 08:17:23 -07:00