chore: Change moderations api response to Provider returned categories (#3098)

# What does this PR do? To be compliant with model policies for LLAMA, just return the categories as is from provider, we will lose the OAI compat in moderations api response.   ## Test Plan `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`
2025-12-03 18:00:36 +00:00 · 2025-08-13 09:47:35 -07:00 · 2025-08-13 09:47:35 -07:00 · 25e0553eed
commit 25e0553eed
parent a9081d87b9
6 changed files with 16 additions and 97 deletions
--- a/docs/_static/llama-stack-spec.html
+++ b/docs/_static/llama-stack-spec.html
@ -16569,7 +16569,7 @@
                        "additionalProperties": {
                            "type": "number"
                        },
-                        "description": "A list of the categories along with their scores as predicted by model. Required set of categories that need to be in response - violence - violence/graphic - harassment - harassment/threatening - hate - hate/threatening - illicit - illicit/violent - sexual - sexual/minors - self-harm - self-harm/intent - self-harm/instructions"
+                        "description": "A list of the categories along with their scores as predicted by model."
                    },
                    "user_message": {
                        "type": "string"
--- a/docs/_static/llama-stack-spec.yaml
+++ b/docs/_static/llama-stack-spec.yaml
@ -12322,10 +12322,6 @@ components:
            type: number
          description: >-
            A list of the categories along with their scores as predicted by model.
-            Required set of categories that need to be in response - violence - violence/graphic
-            - harassment - harassment/threatening - hate - hate/threatening - illicit
-            - illicit/violent - sexual - sexual/minors - self-harm - self-harm/intent
-            - self-harm/instructions
        user_message:
          type: string
        metadata: