Improve groq OpenAI API compatibility

This doesn't get Groq to 100% on the OpenAI API verification tests,
but it does get it to 88.2% when Llama Stack is in the middle,
compared to the 61.8% results for using an OpenAI client against Groq
directly.

The groq provider doesn't use litellm under the covers in its
openai_chat_completion endpoint, and instead directly uses an
AsyncOpenAI client with some special handling to improve conformance
of responses for response_format usage and tool calling.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
This commit is contained in:
Ben Browning 2025-04-13 13:35:53 -04:00
parent 657bb12e85
commit 8a1c0a1008
16 changed files with 418 additions and 45 deletions

View file

@ -17,6 +17,11 @@ providers:
config:
url: https://api.fireworks.ai/inference/v1
api_key: ${env.FIREWORKS_API_KEY}
- provider_id: groq
provider_type: remote::groq
config:
url: https://api.groq.com
api_key: ${env.GROQ_API_KEY}
- provider_id: openai
provider_type: remote::openai
config:
@ -98,6 +103,21 @@ models:
provider_id: fireworks
provider_model_id: accounts/fireworks/models/llama4-maverick-instruct-basic
model_type: llm
- metadata: {}
model_id: groq/llama-3.3-70b-versatile
provider_id: groq
provider_model_id: groq/llama-3.3-70b-versatile
model_type: llm
- metadata: {}
model_id: groq/llama-4-scout-17b-16e-instruct
provider_id: groq
provider_model_id: groq/meta-llama/llama-4-scout-17b-16e-instruct
model_type: llm
- metadata: {}
model_id: groq/llama-4-maverick-17b-128e-instruct
provider_id: groq
provider_model_id: groq/meta-llama/llama-4-maverick-17b-128e-instruct
model_type: llm
- metadata: {}
model_id: openai/gpt-4o
provider_id: openai