llama-stack-mirror/llama_stack/providers/inline
Derek Higgins 6434cdfdab fix: Run prompt_guard model in a seperate thread
The GPU model usage blocks the CPU. Move it
to its own thread. Also wrap in a lock to
prevent multiple simultaneous run from exhausting
the GPU.

Closes: #1746
Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-03-28 14:19:30 +00:00
..
agents feat(rag): entire document context with attachments (#1763) 2025-03-23 16:57:48 -07:00
datasetio fix: Call pandas.read_* in a seperate thread (#1698) 2025-03-19 10:46:37 -07:00
eval fix: fix jobs api literal return type (#1757) 2025-03-21 14:04:21 -07:00
inference fix: Updating ToolCall.arguments to allow for json strings that can be decoded on client side (#1685) 2025-03-19 10:36:19 -07:00
ios/inference chore: removed executorch submodule (#1265) 2025-02-25 21:57:21 -08:00
post_training chore: fix mypy violations in post_training modules (#1548) 2025-03-18 14:58:16 -07:00
safety fix: Run prompt_guard model in a seperate thread 2025-03-28 14:19:30 +00:00
scoring fix: a couple of tests were broken and not yet exercised by our per-PR test workflow 2025-03-21 12:12:14 -07:00
telemetry chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
tool_runtime chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711) 2025-03-20 10:01:10 -07:00
vector_io chore: Updating sqlite-vec to make non-blocking calls (#1762) 2025-03-23 17:25:44 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00