Skip to content

Conversation

akram
Copy link
Contributor

@akram akram commented Sep 9, 2025

What does this PR do?

Add dynamic authentication token forwarding support for vLLM provider

This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough.

  • Add LiteLLMOpenAIMixin that manages the vllm_api_token properly

Usage:

  • Static: VLLM_API_TOKEN env var or config.api_token
  • Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token
    All existing functionality is preserved while adding new dynamic capabilities.

Test Plan

curl -X POST "http://localhost:8000/v1/chat/completions" -H "Authorization: Bearer my-dynamic-token" \
  -H "X-LlamaStack-Provider-Data: {\"vllm_api_token\": \"Bearer my-dynamic-token\", \"vllm_url\": \"http://dynamic-server:8000\"}" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}]}'
  

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 9, 2025
@akram akram marked this pull request as ready for review September 9, 2025 16:10
@akram
Copy link
Contributor Author

akram commented Sep 9, 2025

/assign @grs
/assign @leseb

Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo a minor comment

@akram akram force-pushed the vllm-support-for-dynamic-token branch 5 times, most recently from 540bc4d to e08b54c Compare September 10, 2025 13:32
@akram
Copy link
Contributor Author

akram commented Sep 10, 2025

After more extensive tests, the initial implementation was insufficient to make the Response API and the Agent API where not properly using the provider data.

So, I had to add this second commit to implement it correctly. It was requiring the inference_api to be wrapped in a wrapper so headers can be extracted properly.

/assign @grs

/assign @ashwinb

@ashwinb can PTAL a second time?

@akram akram force-pushed the vllm-support-for-dynamic-token branch 4 times, most recently from ad79c22 to 87edba0 Compare September 10, 2025 18:56
@ashwinb
Copy link
Contributor

ashwinb commented Sep 10, 2025

This looks a fair bit complex. I believe there's an easier way or maybe the request state is not being propagated correctly. Will look into this in detail soon. Hold on...

@akram
Copy link
Contributor Author

akram commented Sep 11, 2025

/hold

@akram akram force-pushed the vllm-support-for-dynamic-token branch 2 times, most recently from 2d87dc2 to 820724d Compare September 11, 2025 08:59
@akram akram force-pushed the vllm-support-for-dynamic-token branch 2 times, most recently from c192d5b to e3ddcb5 Compare September 11, 2025 09:05
@akram
Copy link
Contributor Author

akram commented Sep 11, 2025

@ashwinb you are right . I think I got confused with my own bug. It seems that by just adding the correct providers to agents and responses it forwards PROVIDER_DATA_VAR correctly.

can you PTAL ?

/hold cancel

@akram akram requested a review from ashwinb September 11, 2025 09:13
@akram akram force-pushed the vllm-support-for-dynamic-token branch 6 times, most recently from 05b15a4 to df220d5 Compare September 11, 2025 14:44
@akram akram force-pushed the vllm-support-for-dynamic-token branch 2 times, most recently from c0060a2 to 23404dc Compare September 11, 2025 17:09
@akram
Copy link
Contributor Author

akram commented Sep 11, 2025

@mattf can you PTAL?

…ovider

This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough.

- Add LiteLLMOpenAIMixin that manages the vllm_api_token properly

Usage:

- Static: VLLM_API_TOKEN env var or config.api_token
- Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token
- All existing functionality is preserved while adding new dynamic capabilities.

Signed-off-by: Akram Ben Aissi <[email protected]>
@akram akram force-pushed the vllm-support-for-dynamic-token branch from 23404dc to 5a74aa8 Compare September 11, 2025 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants