feat: Add dynamic authentication token forwarding support for vLLM (#3388)

# What does this PR do? *Add dynamic authentication token forwarding support for vLLM provider* This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough. - Add LiteLLMOpenAIMixin that manages the vllm_api_token properly Usage: - Static: VLLM_API_TOKEN env var or config.api_token - Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token All existing functionality is preserved while adding new dynamic capabilities.    ## Test Plan  ``` curl -X POST "http://localhost:8000/v1/chat/completions" -H "Authorization: Bearer my-dynamic-token" \ -H "X-LlamaStack-Provider-Data: {\"vllm_api_token\": \"Bearer my-dynamic-token\", \"vllm_url\": \"http://dynamic-server:8000\"}" \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}]}' ``` --------- Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
2025-12-03 18:00:36 +00:00 · 2025-09-18 10:13:55 +01:00 · 2025-09-18 10:13:55 +01:00 · 4842145202
commit 4842145202
parent 42c23b45f6
4 changed files with 219 additions and 48 deletions
--- a/llama_stack/providers/registry/inference.py
+++ b/llama_stack/providers/registry/inference.py
@ -78,6 +78,7 @@ def available_providers() -> list[ProviderSpec]:
                pip_packages=[],
                module="llama_stack.providers.remote.inference.vllm",
                config_class="llama_stack.providers.remote.inference.vllm.VLLMInferenceAdapterConfig",
+                provider_data_validator="llama_stack.providers.remote.inference.vllm.VLLMProviderDataValidator",
                description="Remote vLLM inference provider for connecting to vLLM servers.",
            ),
        ),