Improve groq OpenAI API compatibility

This doesn't get Groq to 100% on the OpenAI API verification tests, but it does get it to 88.2% when Llama Stack is in the middle, compared to the 61.8% results for using an OpenAI client against Groq directly. The groq provider doesn't use litellm under the covers in its openai_chat_completion endpoint, and instead directly uses an AsyncOpenAI client with some special handling to improve conformance of responses for response_format usage and tool calling. Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-12-31 03:29:59 +00:00 · 2025-04-13 13:35:53 -04:00 · 2025-04-13 13:35:53 -04:00 · 8a1c0a1008
commit 8a1c0a1008
parent 657bb12e85
16 changed files with 418 additions and 45 deletions
--- a/llama_stack/providers/utils/inference/openai_compat.py
+++ b/llama_stack/providers/utils/inference/openai_compat.py
@ -1354,14 +1354,7 @@ class OpenAIChatCompletionToLlamaStackMixin:
            i = 0
            async for chunk in response:
                event = chunk.event
-                if event.stop_reason == StopReason.end_of_turn:
-                    finish_reason = "stop"
-                elif event.stop_reason == StopReason.end_of_message:
-                    finish_reason = "eos"
-                elif event.stop_reason == StopReason.out_of_tokens:
-                    finish_reason = "length"
-                else:
-                    finish_reason = None
+                finish_reason = _convert_stop_reason_to_openai_finish_reason(event.stop_reason)

                if isinstance(event.delta, TextDelta):
                    text_delta = event.delta.text