Signed-off-by: Adrian Cole <adrian.cole@elastic.co> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace modeling related code. The actual inference consideration is handled by Inference.