-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add dynamic authentication token forwarding support for vLLM #3388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm modulo a minor comment
540bc4d
to
e08b54c
Compare
After more extensive tests, the initial implementation was insufficient to make the So, I had to add this second commit to implement it correctly. It was requiring the /assign @grs /assign @ashwinb @ashwinb can PTAL a second time? |
ad79c22
to
87edba0
Compare
This looks a fair bit complex. I believe there's an easier way or maybe the request state is not being propagated correctly. Will look into this in detail soon. Hold on... |
/hold |
2d87dc2
to
820724d
Compare
c192d5b
to
e3ddcb5
Compare
@ashwinb you are right . I think I got confused with my own bug. It seems that by just adding the correct providers to agents and responses it forwards can you PTAL ? /hold cancel |
05b15a4
to
df220d5
Compare
c0060a2
to
23404dc
Compare
@mattf can you PTAL? |
…ovider This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough. - Add LiteLLMOpenAIMixin that manages the vllm_api_token properly Usage: - Static: VLLM_API_TOKEN env var or config.api_token - Dynamic: X-LlamaStack-Provider-Data header with vllm_api_token - All existing functionality is preserved while adding new dynamic capabilities. Signed-off-by: Akram Ben Aissi <[email protected]>
23404dc
to
5a74aa8
Compare
What does this PR do?
Add dynamic authentication token forwarding support for vLLM provider
This enables per-request authentication tokens for vLLM providers, supporting use cases like RAG operations where different requests may need different authentication tokens. The implementation follows the same pattern as other providers like Together AI, Fireworks, and Passthrough.
Usage:
All existing functionality is preserved while adding new dynamic capabilities.
Test Plan