Use inference APIs for executing Llama Guard (#121)

We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace modeling related code. The actual inference consideration is handled by Inference.
2025-06-27 18:50:41 +00:00 · 2024-09-28 15:40:06 -07:00 · 2024-09-28 15:40:06 -07:00 · 0a3999a9a4
commit 0a3999a9a4
parent 6236634d84
9 changed files with 167 additions and 204 deletions
--- a/llama_stack/providers/registry/safety.py
+++ b/llama_stack/providers/registry/safety.py
@ -21,10 +21,9 @@ def available_providers() -> List[ProviderSpec]:
            api=Api.safety,
            provider_id="meta-reference",
            pip_packages=[
-                "accelerate",
                "codeshield",
-                "torch",
                "transformers",
+                "torch --index-url https://download.pytorch.org/whl/cpu",
            ],
            module="llama_stack.providers.impls.meta_reference.safety",
            config_class="llama_stack.providers.impls.meta_reference.safety.SafetyConfig",