feat: Add dynamic authentication token forwarding support for vLLM (#3388)

# What does this PR do?


*Add dynamic authentication token forwarding support for vLLM provider*

This enables per-request authentication tokens for vLLM providers,
supporting use cases like RAG operations where different requests may
need different authentication tokens. The implementation follows the
same pattern as other providers like Together AI, Fireworks, and
Passthrough.

- Add LiteLLMOpenAIMixin that manages the vllm_api_token properly

Usage:

- Static: VLLM_API_TOKEN env var or config.api_token
- Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token
All existing functionality is preserved while adding new dynamic
capabilities.


<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

```
curl -X POST "http://localhost:8000/v1/chat/completions" -H "Authorization: Bearer my-dynamic-token" \
  -H "X-LlamaStack-Provider-Data: {\"vllm_api_token\": \"Bearer my-dynamic-token\", \"vllm_url\": \"http://dynamic-server:8000\"}" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}]}'
  
```

---------

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
This commit is contained in:
Akram Ben Aissi 2025-09-18 10:13:55 +01:00 committed by GitHub
parent 42c23b45f6
commit 4842145202
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 219 additions and 48 deletions

View file

@ -78,6 +78,7 @@ def available_providers() -> list[ProviderSpec]:
pip_packages=[],
module="llama_stack.providers.remote.inference.vllm",
config_class="llama_stack.providers.remote.inference.vllm.VLLMInferenceAdapterConfig",
provider_data_validator="llama_stack.providers.remote.inference.vllm.VLLMProviderDataValidator",
description="Remote vLLM inference provider for connecting to vLLM servers.",
),
),