llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-31 14:23:52 +00:00

History

Derek Higgins 6434cdfdab fix: Run prompt_guard model in a seperate thread The GPU model usage blocks the CPU. Move it to its own thread. Also wrap in a lock to prevent multiple simultaneous run from exhausting the GPU. Closes: #1746 Signed-off-by: Derek Higgins <derekh@redhat.com>		2025-03-28 14:19:30 +00:00
..
agents	feat(rag): entire document context with attachments (#1763 )	2025-03-23 16:57:48 -07:00
datasetio	fix: Call pandas.read_* in a seperate thread (#1698 )	2025-03-19 10:46:37 -07:00
eval	fix: fix jobs api literal return type (#1757 )	2025-03-21 14:04:21 -07:00
inference	fix: Updating `ToolCall.arguments` to allow for json strings that can be decoded on client side (#1685 )	2025-03-19 10:36:19 -07:00
ios/inference	chore: removed executorch submodule (#1265 )	2025-02-25 21:57:21 -08:00
post_training	chore: fix mypy violations in post_training modules (#1548 )	2025-03-18 14:58:16 -07:00
safety	fix: Run prompt_guard model in a seperate thread	2025-03-28 14:19:30 +00:00
scoring	fix: a couple of tests were broken and not yet exercised by our per-PR test workflow	2025-03-21 12:12:14 -07:00
telemetry	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
tool_runtime	chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711 )	2025-03-20 10:01:10 -07:00
vector_io	chore: Updating sqlite-vec to make non-blocking calls (#1762 )	2025-03-23 17:25:44 -07:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00