# What does this PR do?
update Groq inference provider to use OpenAIMixin for openai-compat
endpoints
changes on api.groq.com -
- json_schema is now supported for specific models, see
https://console.groq.com/docs/structured-outputs#supported-models
- response_format with streaming is now supported for models that
support response_format
- groq no longer returns a 400 error if tools are provided and
tool_choice is not "required"
## Test Plan
```
$ GROQ_API_KEY=... uv run llama stack build --image-type venv --providers inference=remote::groq --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model groq/llama-3.3-70b-versatile tests/integration/inference/test_openai_completion.py -k 'not store'
...
SKIPPED [3] tests/integration/inference/test_openai_completion.py:44: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support OpenAI completions.
SKIPPED [3] tests/integration/inference/test_openai_completion.py:94: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support vllm extra_body parameters.
SKIPPED [4] tests/integration/inference/test_openai_completion.py:73: Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support n param.
SKIPPED [1] tests/integration/inference/test_openai_completion.py💯 Model groq/llama-3.3-70b-versatile hosted by remote::groq doesn't support chat completion calls with base64 encoded files.
======================= 8 passed, 11 skipped, 8 deselected, 2 warnings in 5.13s ========================
```
---------
Co-authored-by: raghotham <rsm@meta.com>
# What does this PR do?
add an `OpenAIMixin` for use by inference providers who remote endpoints
support an OpenAI compatible API.
use is demonstrated by refactoring
- OpenAIInferenceAdapter
- NVIDIAInferenceAdapter (adds embedding support)
- LlamaCompatInferenceAdapter
## Test Plan
existing unit and integration tests
# What does this PR do?
Some of our inference providers support passthrough authentication via
`x-llamastack-provider-data` header values. This fixes the providers
that support passthrough auth to not cache their clients to the backend
providers (mostly OpenAI client instances) so that the client connecting
to Llama Stack has to provide those auth values on each and every
request.
## Test Plan
I added some unit tests to ensure we're not caching clients across
requests for all the fixed providers in this PR.
```
uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py
```
I also ran some of our OpenAI compatible API integration tests for each
of the changed providers, just to ensure they still work. Note that
these providers don't actually pass all these tests (for unrelated
reasons due to quirks of the Groq and Together SaaS services), but
enough of the tests passed to confirm the clients are still working as
intended.
### Together
```
ENABLE_TOGETHER="together" \
uv run llama stack run llama_stack/templates/starter/run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
tests/integration/inference/test_openai_completion.py \
--text-model "together/meta-llama/Llama-3.1-8B-Instruct"
```
### OpenAI
```
ENABLE_OPENAI="openai" \
uv run llama stack run llama_stack/templates/starter/run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
tests/integration/inference/test_openai_completion.py \
--text-model "openai/gpt-4o-mini"
```
### Groq
```
ENABLE_GROQ="groq" \
uv run llama stack run llama_stack/templates/starter/run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv \
tests/integration/inference/test_openai_completion.py \
--text-model "groq/meta-llama/Llama-3.1-8B-Instruct"
```
---------
Signed-off-by: Ben Browning <bbrownin@redhat.com>