llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Ben Browning c64f0d5888 fix: Get builtin tool calling working in remote-vllm (#1236 ) # What does this PR do? This PR makes a couple of changes required to get the test `tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search` passing on the remote-vllm provider. First, we adjust agent_instance to also pass in the description and parameters of builtin tools. We need these parameters so we can pass the tool's expected parameters into vLLM. The meta-reference implementations may not have needed these for builtin tools, as they are able to take advantage of the Llama-model specific support for certain builtin tools. However, with vLLM, our server-side chat templates for tool calling treat all tools the same and don't separate out Llama builtin vs custom tools. So, we need to pass the full set of parameter definitions and list of required parameters for builtin tools as well. Next, we adjust the vllm streaming chat completion code to fix up some edge cases where it was returning an extra ChatCompletionResponseEvent with an empty ToolCall with empty string call_id, tool_name, and arguments properties. This is a bug discovered after the above fix, where after a successful tool invocation we were sending extra chunks back to the client with these empty ToolCalls. ## Test Plan With these changes, the following test that previously failed now passes: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` Additionally, I ran the remote-vllm client-sdk and provider inference tests as below to ensure they all still passed with this change: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` ``` VLLM_URL="http://localhost:8000/v1" \ python -m pytest -s -v \ llama_stack/providers/tests/inference/test_text_inference.py \ --providers "inference=vllm_remote" ``` [//]: # (## Documentation) Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-02-26 15:25:47 -05:00
..
anthropic	feat: add (openai, anthropic, gemini) providers via litellm (#1267 )	2025-02-25 22:07:33 -08:00
bedrock	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
cerebras	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
databricks	fix: resolve type hint issues and import dependencies (#1176 )	2025-02-25 11:06:47 -08:00
fireworks	feat: add (openai, anthropic, gemini) providers via litellm (#1267 )	2025-02-25 22:07:33 -08:00
gemini	feat: add (openai, anthropic, gemini) providers via litellm (#1267 )	2025-02-25 22:07:33 -08:00
groq	feat: Add Groq distribution template (#1173 )	2025-02-25 14:16:56 -08:00
nvidia	refactor: move OpenAI compat utilities from nvidia to openai_compat (#1258 )	2025-02-25 22:02:11 -08:00
ollama	feat(providers): support non-llama models for inference providers (#1200 )	2025-02-21 13:21:28 -08:00
openai	fix: make vision and embedding tests pass with openai, anthropic and gemini	2025-02-26 11:24:01 -08:00
passthrough	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
runpod	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
sambanova	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
sample	build: format codebase imports using ruff linter (#1028 )	2025-02-13 10:06:21 -08:00
tgi	feat(api): Add options for supporting various embedding models (#1192 )	2025-02-20 22:27:12 -08:00
together	feat(providers): support non-llama models for inference providers (#1200 )	2025-02-21 13:21:28 -08:00
vllm	fix: Get builtin tool calling working in remote-vllm (#1236 )	2025-02-26 15:25:47 -05:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00