mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-31 06:03:52 +00:00
The GPU model usage blocks the CPU. Move it to its own thread. Also wrap in a lock to prevent multiple simultaneous run from exhausting the GPU. Closes: #1746 Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|---|---|---|
| .. | ||
| apis | ||
| cli | ||
| distribution | ||
| models/llama | ||
| providers | ||
| strong_typing | ||
| templates | ||
| __init__.py | ||
| env.py | ||
| log.py | ||
| schema_utils.py | ||