llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Xi Yan 66d7e15c93 perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 ) # What does this PR do? Problem - Using script: https://gist.github.com/thoraxe/6163b2145ce7b1c24c6026b64cf90085 - This hits an issue on server with `code_interpreter` not found, as we do not pass "builtin::code_interpreter" in AgentConfig's `toolgroups`. This is a general issue where model always tries to output `code_interpreter` in `ToolCall` even when we do not have `code_interpreter` available for execution. Reproduce Deeper Problem in chat-completion - Use script: https://gist.github.com/yanxi0830/163a9ad7b5db10556043fbfc7ecd7603 1. We currently always populate `code_interpreter` in `ToolCall` in ChatCompletionResponse if the model's response begins with `<\|python_tag\|>`. See `c5f5958498/models/llama3/api/chat_format.py (L200-L213)` <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> 2. This happens even if we do not pass the `code_interpreter` as a `tools` in ChatCompletionRequest. This PR Explicitly make sure that the tools returned in `ChatCompletionResponse.tool_calls` is always a tool requested by `ChatCompletionRequest.tools`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Before <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> <img width="997" alt="image" src="https://github.com/user-attachments/assets/d3e82b62-b142-4939-954c-62843bec7110" /> After <img width="856" alt="image" src="https://github.com/user-attachments/assets/2c70ce55-c8d0-45ea-b10f-f70adc50d3d9" /> <img width="1000" alt="image" src="https://github.com/user-attachments/assets/b5e81826-c35b-4052-bf81-7afff93ce2ef" /> Unit Test ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/ ``` <img width="1002" alt="image" src="https://github.com/user-attachments/assets/04808517-eded-4122-97f5-7e5142de9779" /> Streaming - Chat Completion <img width="902" alt="image" src="https://github.com/user-attachments/assets/f477bc86-bd38-4729-b49e-a0a6ed3f835a" /> - Agent <img width="916" alt="image" src="https://github.com/user-attachments/assets/f4cc3417-23cd-46b1-953d-3a2271e79bbb" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)		2025-02-11 18:31:35 -08:00
..
bedrock	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
cerebras	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
databricks	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
fireworks	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
groq	chore: add missing ToolConfig import in groq.py (#983 )	2025-02-07 09:35:00 -08:00
nvidia	feat: Add a new template for `dell` (#978 )	2025-02-06 14:14:39 -08:00
ollama	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
runpod	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
sambanova	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
sample	[remove import ] clean up import 's (#689 )	2024-12-27 15:45:44 -08:00
tgi	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
together	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
vllm	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00