llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-28 01:31:59 +00:00

History

Ben Browning 9f2a7e6a74 fix: multiple tool calls in remote-vllm chat_completion This fixes an issue in how we used the tool_call_buf from streaming tool calls in the remote-vllm provider where it would end up concatenating parameters from multiple different tool call results instead of aggregating the results from each tool call separately. It also fixes an issue found while digging into that where we were accidentally mixing the json string form of tool call parameters with the string representation of the python form, which mean we'd end up with single quotes in what should be double-quoted json strings. The following tests are now passing 100% for the remote-vllm provider, where some of the test_text_inference were failing before this change: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_vision_inference.py --vision-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" ``` Many of the agent tests are passing, although some are failing due to bugs in vLLM's pythonic tool parser for Llama models. See the PR at https://github.com/vllm-project/vllm/pull/17917 and a gist at https://gist.github.com/bbrowning/b5007709015cb2aabd85e0bd08e6d60f for changes needed there, which will have to get made upstream in vLLM. Agent tests: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/agents/test_agents.py --text-model "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic" ```` Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-05-14 20:58:57 -04:00
..
anthropic	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
bedrock	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
cerebras	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
cerebras_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
databricks	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
fireworks	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
fireworks_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
gemini	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
groq	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
groq_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
llama_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
nvidia	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
ollama	fix: ollama openai completion and chat completion params (#2125 )	2025-05-12 10:57:53 -07:00
openai	feat: expand set of known openai models, allow using openai canonical model names (#2164 )	2025-05-14 13:18:15 -07:00
passthrough	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
runpod	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
sambanova	feat(providers): sambanova updated to use LiteLLM openai-compat (#1596 )	2025-05-06 16:50:22 -07:00
sambanova_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
tgi	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
together	fix: revert "feat(provider): adding llama4 support in together inference provider (#2123 )" (#2124 )	2025-05-08 15:18:16 -07:00
together_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
vllm	fix: multiple tool calls in remote-vllm chat_completion	2025-05-14 20:58:57 -04:00
watsonx	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00