Improve groq OpenAI API compatibility

This doesn't get Groq to 100% on the OpenAI API verification tests,
but it does get it to 88.2% when Llama Stack is in the middle,
compared to the 61.8% results for using an OpenAI client against Groq
directly.

The groq provider doesn't use litellm under the covers in its
openai_chat_completion endpoint, and instead directly uses an
AsyncOpenAI client with some special handling to improve conformance
of responses for response_format usage and tool calling.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
This commit is contained in:
Ben Browning 2025-04-13 13:35:53 -04:00
parent 657bb12e85
commit 8a1c0a1008
16 changed files with 418 additions and 45 deletions

View file

@ -43,7 +43,9 @@ The following models are available by default:
- `groq/llama-3.3-70b-versatile (aliases: meta-llama/Llama-3.3-70B-Instruct)`
- `groq/llama-3.2-3b-preview (aliases: meta-llama/Llama-3.2-3B-Instruct)`
- `groq/llama-4-scout-17b-16e-instruct (aliases: meta-llama/Llama-4-Scout-17B-16E-Instruct)`
- `groq/meta-llama/llama-4-scout-17b-16e-instruct (aliases: meta-llama/Llama-4-Scout-17B-16E-Instruct)`
- `groq/llama-4-maverick-17b-128e-instruct (aliases: meta-llama/Llama-4-Maverick-17B-128E-Instruct)`
- `groq/meta-llama/llama-4-maverick-17b-128e-instruct (aliases: meta-llama/Llama-4-Maverick-17B-128E-Instruct)`
### Prerequisite: API Keys