llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Ben Browning 934446ddb4 fix: ollama still using tools with `tool_choice="none"` (#2047 ) # What does this PR do? In our OpenAI API verification tests, ollama was still calling tools even when `tool_choice="none"` was passed in its chat completion requests. Because ollama isn't respecting `tool_choice` properly, this adjusts our provider implementation to remove the `tools` from the request if `tool_choice="none"` is passed in so that it does not attempt to call any of those tools. ## Test Plan I tested this with a couple of Llama models, using both our OpenAI completions integration tests and our verification test suites. ### OpenAI Completions / Chat Completions integration tests These all passed before, and still do. ``` INFERENCE_MODEL="llama3.2:3b-instruct-fp16" \ llama stack build --template ollama --image-type venv --run ``` ``` LLAMA_STACK_CONFIG=http://localhost:8321 \ python -m pytest -v \ tests/integration/inference/test_openai_completion.py \ --text-model "llama3.2:3b-instruct-fp16" ``` ### OpenAI API Verification test suite test_chat_*_tool_choice_none OpenAI API verification tests pass now, when they failed before. See https://github.com/bbrowning/llama-stack-tests/blob/main/openai-api-verification/2025-04-27.md#ollama-llama-stack for an example of these failures from a recent nightly CI run. ``` INFERENCE_MODEL="llama3.3:70b-instruct-q3_K_M" \ llama stack build --template ollama --image-type venv --run ``` ``` cat <<-EOF > tests/verifications/conf/ollama-llama-stack.yaml base_url: http://localhost:8321/v1/openai/v1 api_key_var: OPENAI_API_KEY models: - llama3.3:70b-instruct-q3_K_M model_display_names: llama3.3:70b-instruct-q3_K_M: Llama-3.3-70B-Instruct test_exclusions: llama3.3:70b-instruct-q3_K_M: - test_chat_non_streaming_image - test_chat_streaming_image - test_chat_multi_turn_multiple_images EOF ``` ``` python -m pytest -s -v \ 'tests/verifications/openai_api/test_chat_completion.py' \ --provider=ollama-llama-stack ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-04-29 10:45:28 +02:00
..
anthropic	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
bedrock	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
cerebras	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
cerebras_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
databricks	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
fireworks	fix: OpenAI Completions API and Fireworks (#1997 )	2025-04-21 11:49:12 -07:00
fireworks_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
gemini	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
groq	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
groq_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
nvidia	feat: NVIDIA allow non-llama model registration (#1859 )	2025-04-24 17:13:33 -07:00
ollama	fix: ollama still using tools with `tool_choice="none"` (#2047 )	2025-04-29 10:45:28 +02:00
openai	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
passthrough	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
runpod	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
sambanova	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
sambanova_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
tgi	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
together	fix: Together provider shutdown and default to non-streaming (#2001 )	2025-04-22 17:47:53 +02:00
together_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
vllm	fix: Added lazy initialization of the remote vLLM client to avoid issues with expired asyncio event loop (#1969 )	2025-04-23 15:33:19 +02:00
watsonx	fix: updated watsonx inference chat apis with new repo changes (#2033 )	2025-04-26 10:17:52 -07:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00