llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Ben Browning dd1a366347 fix: logprobs support in remote-vllm provider (#1074 ) # What does this PR do? The remote-vllm provider was not passing logprobs options from CompletionRequest or ChatCompletionRequests through to the OpenAI client parameters. I manually verified this, as well as observed this provider failing `TestInference::test_completion_logprobs`. This was filed as issue #1073. This fixes that by passing the `logprobs.top_k` value through to the parameters we pass into the OpenAI client. Additionally, this fixes a bug in `test_text_inference.py` where it mistakenly assumed chunk.delta were of type `ContentDelta` for completion requests. The deltas are of type `ContentDelta` for chat completion requests, but for basic completion requests the deltas are of type string. This test was likely failing for other providers that did properly support logprobs because of this latter issue in the test, which was hit while fixing the above issue with the remote-vllm provider. (Closes #1073) ## Test Plan First, you need a vllm running. I ran one locally like this: ``` vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8001 --enable-auto-tool-choice --tool-call-parser llama3_json ``` Next, run test_text_inference.py against this vllm using the remote vllm provider like this: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` Before my change, the test failed with this error: ``` llama_stack/providers/tests/inference/test_text_inference.py:155: in test_completion_logprobs assert 1 <= len(response.logprobs) <= 5 E TypeError: object of type 'NoneType' has no len() ``` After my change, the test passes. [//]: # (## Documentation) Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-02-13 11:00:00 -05:00
..
bedrock	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
cerebras	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
databricks	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
fireworks	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
groq	feat: Support tool calling for streaming chat completion in remote vLLM provider (#1063 )	2025-02-12 06:17:21 -08:00
nvidia	feat: Add a new template for `dell` (#978 )	2025-02-06 14:14:39 -08:00
ollama	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
runpod	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
sambanova	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
sample	[remove import ] clean up import 's (#689 )	2024-12-27 15:45:44 -08:00
tgi	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
together	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 )	2025-02-11 18:31:35 -08:00
vllm	fix: logprobs support in remote-vllm provider (#1074 )	2025-02-13 11:00:00 -05:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00