llama-stack-mirror/llama_stack/providers/remote/inference
Ben Browning e2960e9e44 fix: inference providers still using tools with tool_choice="none"
In our OpenAI API verification tests, some providers were still calling
tools even when `tool_choice="none"` was passed in the chat completion
requests. Because they aren't all respecting `tool_choice` properly,
this adjusts our routing implementation to remove the `tools` and
`tool_choice` from the request if `tool_choice="none"` is passed in so
that it does not attempt to call any of those tools. Adjusting this in
the router fixes this across all providers.

This also cleans up the non-streaming chat completion responses for tools,
ensuring it returns `None` instead of an empty list when there are no
tool calls, to exactly match the OpenAI API responses in that case.

I observed existing failures in our OpenAI API verification suite -
see
https://github.com/bbrowning/llama-stack-tests/blob/main/openai-api-verification/2025-04-27.md#together-llama-stack
for the failing `test_chat_*_tool_choice_none` tests. All streaming
and non-streaming variants were failing across all 3 tested models.

After this change, all of those 6 failing tests are now passing with
no regression in the other tests.

I verified this via:

```
llama stack run --image-type venv \
  tests/verifications/openai-api-verification-run.yaml
```

```
python -m pytest -s -v \
  'tests/verifications/openai_api/test_chat_completion.py' \
  --provider=together-llama-stack
```

The entire verification suite is not 100% on together.ai yet, but it's
getting closer.

This also increased the pass rate for fireworks.ai, and did not regress
the groq or openai tests at all.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-30 13:27:28 -04:00
..
anthropic feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
bedrock fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
cerebras fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
cerebras_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
databricks fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
fireworks fix: OpenAI Completions API and Fireworks (#1997) 2025-04-21 11:49:12 -07:00
fireworks_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
gemini feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
groq fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
groq_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
llama_openai_compat feat: add api.llama provider, llama-guard-4 model (#2058) 2025-04-29 10:07:41 -07:00
nvidia feat: NVIDIA allow non-llama model registration (#1859) 2025-04-24 17:13:33 -07:00
ollama fix: inference providers still using tools with tool_choice="none" 2025-04-30 13:27:28 -04:00
openai feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
passthrough fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
runpod fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
tgi fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
together fix: Together provider shutdown and default to non-streaming (#2001) 2025-04-22 17:47:53 +02:00
together_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
vllm fix: Added lazy initialization of the remote vLLM client to avoid issues with expired asyncio event loop (#1969) 2025-04-23 15:33:19 +02:00
watsonx fix: updated watsonx inference chat apis with new repo changes (#2033) 2025-04-26 10:17:52 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00