llama-stack-mirror/llama_stack/providers/remote/inference
Ben Browning 934446ddb4
fix: ollama still using tools with tool_choice="none" (#2047)
# What does this PR do?

In our OpenAI API verification tests, ollama was still calling tools
even when `tool_choice="none"` was passed in its chat completion
requests. Because ollama isn't respecting `tool_choice` properly, this
adjusts our provider implementation to remove the `tools` from the
request if `tool_choice="none"` is passed in so that it does not attempt
to call any of those tools.

## Test Plan

I tested this with a couple of Llama models, using both our OpenAI
completions integration tests and our verification test suites.

### OpenAI Completions / Chat Completions integration tests

These all passed before, and still do.

```
INFERENCE_MODEL="llama3.2:3b-instruct-fp16" \
  llama stack build --template ollama --image-type venv --run
```

```
LLAMA_STACK_CONFIG=http://localhost:8321 \
  python -m pytest -v \
  tests/integration/inference/test_openai_completion.py \
  --text-model "llama3.2:3b-instruct-fp16"
```

### OpenAI API Verification test suite

test_chat_*_tool_choice_none OpenAI API verification tests pass now,
when they failed before.

See

https://github.com/bbrowning/llama-stack-tests/blob/main/openai-api-verification/2025-04-27.md#ollama-llama-stack
for an example of these failures from a recent nightly CI run.

```
INFERENCE_MODEL="llama3.3:70b-instruct-q3_K_M" \
  llama stack build --template ollama --image-type venv --run
```

```
cat <<-EOF > tests/verifications/conf/ollama-llama-stack.yaml
base_url: http://localhost:8321/v1/openai/v1
api_key_var: OPENAI_API_KEY
models:
- llama3.3:70b-instruct-q3_K_M
model_display_names:
  llama3.3:70b-instruct-q3_K_M: Llama-3.3-70B-Instruct
test_exclusions:
  llama3.3:70b-instruct-q3_K_M:
  - test_chat_non_streaming_image
  - test_chat_streaming_image
  - test_chat_multi_turn_multiple_images
EOF
```

```
python -m pytest -s -v \
  'tests/verifications/openai_api/test_chat_completion.py' \
  --provider=ollama-llama-stack
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-29 10:45:28 +02:00
..
anthropic feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
bedrock fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
cerebras fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
cerebras_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
databricks fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
fireworks fix: OpenAI Completions API and Fireworks (#1997) 2025-04-21 11:49:12 -07:00
fireworks_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
gemini feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
groq fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
groq_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
nvidia feat: NVIDIA allow non-llama model registration (#1859) 2025-04-24 17:13:33 -07:00
ollama fix: ollama still using tools with tool_choice="none" (#2047) 2025-04-29 10:45:28 +02:00
openai feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
passthrough fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
runpod fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
tgi fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
together fix: Together provider shutdown and default to non-streaming (#2001) 2025-04-22 17:47:53 +02:00
together_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
vllm fix: Added lazy initialization of the remote vLLM client to avoid issues with expired asyncio event loop (#1969) 2025-04-23 15:33:19 +02:00
watsonx fix: updated watsonx inference chat apis with new repo changes (#2033) 2025-04-26 10:17:52 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00