-
Notifications
You must be signed in to change notification settings - Fork 107
add checks for a dimensionality of a query for BF, HNSW and IVF #1291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexanderguzhva The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Alexandr Guzhva <[email protected]>
01906de
to
40823cc
Compare
@sparknack Hi, could you please confirm that the logic is valid for sparse vectors? Thanks. |
if ((dim != Dim()) && (IsMetricType(metric_str, metric::COSINE) || IsMetricType(metric_str, metric::IP) || | ||
IsMetricType(metric_str, metric::L2))) { | ||
const std::string msg_e = | ||
fmt::format("dimensionalities of the base dataset ({}) and the query ({}) do not match", Dim(), dim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put these repeated codes into a util function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, this should have already been intercepted in Milvus. What issue are you encountering here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a problem in KIOXIA unit test that triggered a problem
@@ -119,6 +119,17 @@ BruteForce::Search(const DataSetPtr base_dataset, const DataSetPtr query_dataset | |||
auto labels = std::make_unique<int64_t[]>(nq * topk); | |||
auto distances = std::make_unique<float[]>(nq * topk); | |||
|
|||
const std::string metric_str = cfg.metric_type.value(); | |||
if ((base_dataset->GetDim() != query_dataset->GetDim()) && | |||
(IsMetricType(metric_str, metric::COSINE) || IsMetricType(metric_str, metric::IP) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IP metrics could also work with sparse vectors.
Can we skip all the query dimension checks by identifying whether the base_dataset
and query_dataset
are sparse?
/kind improvement
issue: #1290