From a654467552f654a2deaad7618933dcd9ac68c20b Mon Sep 17 00:00:00 2001
From: Michael Dawson <mdawson@devrus.com>
Date: Wed, 28 May 2025 12:23:15 -0700
Subject: [PATCH] feat: add cpu/cuda config for prompt guard (#2194)

# What does this PR do?
Previously prompt guard was hard coded to require cuda which prevented
it from being used on an instance without a cuda support.

This PR allows prompt guard to be configured to use either cpu or cuda.

[//]: # (If resolving an issue, uncomment and update the line below)
Closes [#2133](https://github.com/meta-llama/llama-stack/issues/2133)

## Test Plan (Edited after incorporating suggestion)
1) started stack configured with prompt guard as follows on a system
without a GPU
and validated prompt guard could be used through the APIs

2) validated on a system with a gpu (but without llama stack) that the
python selecting between cpu and cuda support returned the right value
when a cuda device was available.

3) ran the unit tests as per -
https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md

[//]: # (## Documentation)

---------

Signed-off-by: Michael Dawson <mdawson@devrus.com>
---
 .../providers/inline/safety/prompt_guard/prompt_guard.py      | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/llama_stack/providers/inline/safety/prompt_guard/prompt_guard.py b/llama_stack/providers/inline/safety/prompt_guard/prompt_guard.py
index 56ce8285f..ff87889ea 100644
--- a/llama_stack/providers/inline/safety/prompt_guard/prompt_guard.py
+++ b/llama_stack/providers/inline/safety/prompt_guard/prompt_guard.py
@@ -75,7 +75,9 @@ class PromptGuardShield:
         self.temperature = temperature
         self.threshold = threshold
 
-        self.device = "cuda"
+        self.device = "cpu"
+        if torch.cuda.is_available():
+            self.device = "cuda"
 
         # load model and tokenizer
         self.tokenizer = AutoTokenizer.from_pretrained(model_dir)