llama-stack/llama_stack/providers/remote/inference
Ben Browning 40e71758d9
fix: inference providers still using tools with tool_choice="none" (#2048)
# What does this PR do?

In our OpenAI API verification tests, some providers were still calling
tools even when `tool_choice="none"` was passed in the chat completion
requests. Because they aren't all respecting `tool_choice` properly,
this adjusts our routing implementation to remove the `tools` and
`tool_choice` from the request if `tool_choice="none"` is passed in so
that it does not attempt to call any of those tools. Adjusting this in
the router fixes this across all providers.

This also cleans up the non-streaming together.ai responses for tools,
ensuring it returns `None` instead of an empty list when there are no
tool calls, to exactly match the OpenAI API responses in that case.

## Test Plan

I observed existing failures in our OpenAI API verification suite - see

https://github.com/bbrowning/llama-stack-tests/blob/main/openai-api-verification/2025-04-27.md#together-llama-stack
for the failing `test_chat_*_tool_choice_none` tests. All streaming and
non-streaming variants were failing across all 3 tested models.

After this change, all of those 6 failing tests are now passing with no
regression in the other tests.

I verified this via:

```
llama stack run --image-type venv \
  tests/verifications/openai-api-verification-run.yaml
```

```
python -m pytest -s -v \
  'tests/verifications/openai_api/test_chat_completion.py' \
  --provider=together-llama-stack
```

The entire verification suite is not 100% on together.ai yet, but it's
getting closer.

This also increased the pass rate for fireworks.ai, and did not regress
the groq or openai tests at all.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-05-07 14:34:47 +02:00
..
anthropic chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
bedrock chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
cerebras chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
cerebras_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
databricks chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
fireworks chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
fireworks_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
gemini chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
groq chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
groq_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
llama_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
nvidia chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
ollama fix: inference providers still using tools with tool_choice="none" (#2048) 2025-05-07 14:34:47 +02:00
openai chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
passthrough chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
runpod chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
sambanova feat(providers): sambanova updated to use LiteLLM openai-compat (#1596) 2025-05-06 16:50:22 -07:00
sambanova_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
tgi chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
together chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
together_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
vllm chore: more mypy fixes (#2029) 2025-05-06 09:52:31 -07:00
watsonx chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00