-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Model] Support multi-vector retrieval #25370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
try:
Do you ok with this api and outputs? There are still some broken features that need to be fixed. But the multi_vector feature is now testable. |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Ready for review (Slight) Break change
def encode2pooling_task(supported_tasks):
# Currently no model supports both token_embed and token_classify.
if "token_embed" in supported_tasks:
return "token_embed"
elif "token_classify" in supported_tasks:
return "token_classify"
else:
raise ValueError(f"pooling_task must be one of {supported_tasks}.")
|
Signed-off-by: wang.yuqi <[email protected]>
Hi @noooop, I just tested and it works fine. The only thing it was missing is |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
def _set_default_parameters(self, model_config: Optional["ModelConfig"]): | ||
if self.task == "embed": | ||
if self.task in ["embed", "token_embed"]: | ||
if self.normalize is None: | ||
self.normalize = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but setting normalization to False by default during encode could make sense because oftentimes you would want the actual last hidden layer at this step.
Do you prefer to have both embed and token_embed use normalize = True by default, or token_embed use normalize = False by default?
Personally, I hope that all similar APIs behave consistently to reduce the learning burden on users.
Or we can give get last hidden state a new api, with default normalize = False. (Would adding too many specialized APIs also increase the user burden and maintenance costs?)
After #20538 landed,
You can control whether to use normalize by pooling_params = PoolingParams(normalize=False) offline or using normalize parameter online per request.
So you don't need to use override_pooler_config=PoolerConfig(normalize=False) when setup, you can set it anytime.
engine_client, | ||
vllm_config, | ||
state.openai_serving_models, | ||
pooling_task=encode2pooling_task(supported_tasks), | ||
request_logger=request_logger, | ||
chat_template=resolved_chat_template, | ||
chat_template_content_format=args.chat_template_content_format, | ||
log_error_stack=args.log_error_stack, | ||
) if "encode" in supported_tasks else None | ||
) if ("token_embed" in supported_tasks | ||
or "token_classify" in supported_tasks) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should treat each task separately. Having sub-tasks just makes things more confusing
The user should pass in the task explicitly
For offline scenarios, we can add llm.token_embed and llm.token_classify. And gradually prevent users from directly using llm.encode
For online scenarios, we need to adaptively select token_embed or token_classify using something like encode2pooling_task method somewhere , unless you want to split the /pooling api into /pooling_token_embed and /pooling_token_classify.
I personally feel that /pooling_token_embed and /pooling_token_classify looks terrible, the online /pooling API is not suitable for major changes yet. We can collect usage scenarios for a while.
This PR adds 800 and changes 37 files.
If only the API needs to be modified, can we merge this PR first and discuss the changes to Frontend and documentation in #25524?
Otherwise, I would have to deal with conflicts and test every day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this page https://docs.vllm.ai/en/latest/models/pooling_models.html#model-conversion to not use encode
task anymore
I'll delay the merge of this PR until after the release so we don't have to worry about back-compatibility issues which further complicate future PRs |
Purpose
After #21227 landed, we hope that pooling models can always use all pooling, and users don’t need to manually set using all pooling.
The current encode api (/pooling api ) mainly targets the classify for each token scenario (e.g. TokenClassification #24872 & reward models), overlooked the embed for each token scenario.
Let's support embed for each token scenario (multi-vector retrieval)
Partial_Fix #25165
We are stepping closer to support ColBERT & ColPali
cc @DarkLight1337 @maxdebayser
(Slight) Break change
By the way, in #20538:
Test Plan
tests/models/language/pooling/test_multi_vector_retrieval.py
tests/test_pooling_params.py
Test Result
pass
Known Issues
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.