llama-stack/llama_stack/providers/remote/inference
Xi Yan 66d7e15c93
perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041)
# What does this PR do?

**Problem**
- Using script:
https://gist.github.com/thoraxe/6163b2145ce7b1c24c6026b64cf90085

- This hits an issue on server with `code_interpreter` not found, as we
do not pass "builtin::code_interpreter" in AgentConfig's `toolgroups`.

This is a general issue where model always tries to output
`code_interpreter` in `ToolCall` even when we do not have
`code_interpreter` available for execution.

**Reproduce Deeper Problem in chat-completion**
- Use script:
https://gist.github.com/yanxi0830/163a9ad7b5db10556043fbfc7ecd7603

1. We currently always populate `code_interpreter` in `ToolCall` in
ChatCompletionResponse if the model's response begins with
`<|python_tag|>`. See
c5f5958498/models/llama3/api/chat_format.py (L200-L213)

<img width="913" alt="image"
src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6"
/>

2. This happens even if we do not pass the `code_interpreter` as a
`tools` in ChatCompletionRequest.

**This PR**

Explicitly make sure that the tools returned in
`ChatCompletionResponse.tool_calls` is always a tool requested by
`ChatCompletionRequest.tools`.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

**Before**
<img width="913" alt="image"
src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6"
/>
<img width="997" alt="image"
src="https://github.com/user-attachments/assets/d3e82b62-b142-4939-954c-62843bec7110"
/>


**After**
<img width="856" alt="image"
src="https://github.com/user-attachments/assets/2c70ce55-c8d0-45ea-b10f-f70adc50d3d9"
/>
<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/b5e81826-c35b-4052-bf81-7afff93ce2ef"
/>



**Unit Test**
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/
```
<img width="1002" alt="image"
src="https://github.com/user-attachments/assets/04808517-eded-4122-97f5-7e5142de9779"
/>



**Streaming**
- Chat Completion
<img width="902" alt="image"
src="https://github.com/user-attachments/assets/f477bc86-bd38-4729-b49e-a0a6ed3f835a"
/>

- Agent
<img width="916" alt="image"
src="https://github.com/user-attachments/assets/f4cc3417-23cd-46b1-953d-3a2271e79bbb"
/>


[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
2025-02-11 18:31:35 -08:00
..
bedrock perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
cerebras perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
databricks perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
fireworks perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
groq chore: add missing ToolConfig import in groq.py (#983) 2025-02-07 09:35:00 -08:00
nvidia feat: Add a new template for dell (#978) 2025-02-06 14:14:39 -08:00
ollama perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
runpod perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
sambanova perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
sample [remove import *] clean up import *'s (#689) 2024-12-27 15:45:44 -08:00
tgi perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
together perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
vllm perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041) 2025-02-11 18:31:35 -08:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00