feat: Add dynamic authentication token forwarding support for vLLM #3388

akram · 2025-09-09T16:09:30Z

What does this PR do?

Add dynamic authentication token forwarding support for vLLM provider

This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough.

Add LiteLLMOpenAIMixin that manages the vllm_api_token properly

Usage:

Static: VLLM_API_TOKEN env var or config.api_token
Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token
All existing functionality is preserved while adding new dynamic capabilities.

Test Plan

curl -X POST "http://localhost:8000/v1/chat/completions" -H "Authorization: Bearer my-dynamic-token" \
  -H "X-LlamaStack-Provider-Data: {\"vllm_api_token\": \"Bearer my-dynamic-token\", \"vllm_url\": \"http://dynamic-server:8000\"}" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}]}'

akram · 2025-09-09T16:18:01Z

/assign @grs
/assign @leseb

llama_stack/providers/remote/inference/vllm/vllm.py

ashwinb

lgtm modulo a minor comment

akram · 2025-09-10T14:21:59Z

After more extensive tests, the initial implementation was insufficient to make the Response API and the Agent API where not properly using the provider data.

So, I had to add this second commit to implement it correctly. It was requiring the inference_api to be wrapped in a wrapper so headers can be extracted properly.

/assign @grs

/assign @ashwinb

@ashwinb can PTAL a second time?

ashwinb · 2025-09-10T21:37:42Z

This looks a fair bit complex. I believe there's an easier way or maybe the request state is not being propagated correctly. Will look into this in detail soon. Hold on...

akram · 2025-09-11T07:50:41Z

/hold

akram · 2025-09-11T09:12:37Z

@ashwinb you are right . I think I got confused with my own bug. It seems that by just adding the correct providers to agents and responses it forwards PROVIDER_DATA_VAR correctly.

can you PTAL ?

/hold cancel

llama_stack/providers/remote/inference/vllm/config.py

llama_stack/providers/remote/inference/vllm/vllm.py

akram · 2025-09-11T18:02:18Z

@mattf can you PTAL?

…ovider This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough. - Add LiteLLMOpenAIMixin that manages the vllm_api_token properly Usage: - Static: VLLM_API_TOKEN env var or config.api_token - Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token - All existing functionality is preserved while adding new dynamic capabilities. Signed-off-by: Akram Ben Aissi <[email protected]>

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 9, 2025

akram marked this pull request as ready for review September 9, 2025 16:10

akram requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist, mattf and slekkala1 as code owners September 9, 2025 16:10

ashwinb reviewed Sep 9, 2025

View reviewed changes

llama_stack/providers/remote/inference/vllm/vllm.py Outdated Show resolved Hide resolved

ashwinb approved these changes Sep 9, 2025

View reviewed changes

akram force-pushed the vllm-support-for-dynamic-token branch 5 times, most recently from 540bc4d to e08b54c Compare September 10, 2025 13:32

akram force-pushed the vllm-support-for-dynamic-token branch 4 times, most recently from ad79c22 to 87edba0 Compare September 10, 2025 18:56

akram force-pushed the vllm-support-for-dynamic-token branch 2 times, most recently from 2d87dc2 to 820724d Compare September 11, 2025 08:59

akram force-pushed the vllm-support-for-dynamic-token branch 2 times, most recently from c192d5b to e3ddcb5 Compare September 11, 2025 09:05

akram requested a review from ashwinb September 11, 2025 09:13

akram force-pushed the vllm-support-for-dynamic-token branch 6 times, most recently from 05b15a4 to df220d5 Compare September 11, 2025 14:44

mattf requested changes Sep 11, 2025

View reviewed changes

llama_stack/providers/remote/inference/vllm/config.py Outdated Show resolved Hide resolved

llama_stack/providers/remote/inference/vllm/vllm.py Outdated Show resolved Hide resolved

akram force-pushed the vllm-support-for-dynamic-token branch 2 times, most recently from c0060a2 to 23404dc Compare September 11, 2025 17:09

akram force-pushed the vllm-support-for-dynamic-token branch from 23404dc to 5a74aa8 Compare September 11, 2025 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add dynamic authentication token forwarding support for vLLM #3388

feat: Add dynamic authentication token forwarding support for vLLM #3388

Uh oh!

akram commented Sep 9, 2025 •

edited

Loading

Uh oh!

akram commented Sep 9, 2025

Uh oh!

Uh oh!

ashwinb left a comment

Uh oh!

akram commented Sep 10, 2025

Uh oh!

ashwinb commented Sep 10, 2025

Uh oh!

akram commented Sep 11, 2025

Uh oh!

akram commented Sep 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

akram commented Sep 11, 2025

Uh oh!

Uh oh!

feat: Add dynamic authentication token forwarding support for vLLM #3388

Are you sure you want to change the base?

feat: Add dynamic authentication token forwarding support for vLLM #3388

Uh oh!

Conversation

akram commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

akram commented Sep 9, 2025

Uh oh!

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

akram commented Sep 10, 2025

Uh oh!

ashwinb commented Sep 10, 2025

Uh oh!

akram commented Sep 11, 2025

Uh oh!

akram commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akram commented Sep 11, 2025

Uh oh!

Uh oh!

akram commented Sep 9, 2025 •

edited

Loading

akram commented Sep 11, 2025 •

edited

Loading