llama-stack-mirror/llama_stack/providers/remote/inference
Akram Ben Aissi 4842145202
feat: Add dynamic authentication token forwarding support for vLLM (#3388)
# What does this PR do?


*Add dynamic authentication token forwarding support for vLLM provider*

This enables per-request authentication tokens for vLLM providers,
supporting use cases like RAG operations where different requests may
need different authentication tokens. The implementation follows the
same pattern as other providers like Together AI, Fireworks, and
Passthrough.

- Add LiteLLMOpenAIMixin that manages the vllm_api_token properly

Usage:

- Static: VLLM_API_TOKEN env var or config.api_token
- Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token
All existing functionality is preserved while adding new dynamic
capabilities.


<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

```
curl -X POST "http://localhost:8000/v1/chat/completions" -H "Authorization: Bearer my-dynamic-token" \
  -H "X-LlamaStack-Provider-Data: {\"vllm_api_token\": \"Bearer my-dynamic-token\", \"vllm_url\": \"http://dynamic-server:8000\"}" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}]}'
  
```

---------

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
2025-09-18 11:13:55 +02:00
..
anthropic chore: update the anthropic inference impl to use openai-python for openai-compat functions (#3366) 2025-09-07 14:00:42 -07:00
azure feat: add Azure OpenAI inference provider support (#3396) 2025-09-11 13:48:38 +02:00
bedrock fix: AWS Bedrock inference profile ID conversion for region-specific endpoints (#3386) 2025-09-11 11:41:53 +02:00
cerebras feat(starter)!: simplify starter distro; litellm model registry changes (#2916) 2025-07-25 15:02:04 -07:00
databricks feat(starter)!: simplify starter distro; litellm model registry changes (#2916) 2025-07-25 15:02:04 -07:00
fireworks refactor(logging): rename llama_stack logger categories (#3065) 2025-08-21 17:31:04 -07:00
gemini chore: update the gemini inference impl to use openai-python for openai-compat functions (#3351) 2025-09-06 12:22:20 -07:00
groq chore: update the groq inference impl to use openai-python for openai-compat functions (#3348) 2025-09-06 15:36:27 -07:00
llama_openai_compat chore: indicate to mypy that InferenceProvider.rerank is concrete (#3238) 2025-08-22 12:02:13 -07:00
nvidia docs: add VLM NIM example (#3277) 2025-08-29 16:23:52 -07:00
ollama feat(tests): auto-merge all model list responses and unify recordings (#3320) 2025-09-03 11:33:03 -07:00
openai refactor(logging): rename llama_stack logger categories (#3065) 2025-08-21 17:31:04 -07:00
passthrough chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
runpod ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
sambanova chore: update the sambanova inference impl to use openai-python for openai-compat functions (#3345) 2025-09-06 12:25:13 -07:00
tgi feat: add dynamic model registration support to TGI inference (#3417) 2025-09-15 15:52:40 -04:00
together feat: add embedding and dynamic model support to Together inference adapter (#3458) 2025-09-16 11:53:41 -07:00
vertexai ci: Re-enable pre-commit to fail (#3399) 2025-09-10 10:00:46 -04:00
vllm feat: Add dynamic authentication token forwarding support for vLLM (#3388) 2025-09-18 11:13:55 +02:00
watsonx chore: various watsonx fixes (#3428) 2025-09-16 13:55:10 +02:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00