-
Notifications
You must be signed in to change notification settings - Fork 30
Deduplicate autocomplete when scientific and common names share substrings #4510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Deduplicate autocomplete when scientific and common names share substrings #4510
Conversation
…names share substrings Previously, when searching for a substring that appeared in both an OTU's scientific name and common name, the autocomplete would return duplicate entries for the same OTU. This created a confusing user experience where the same OTU would appear multiple times in the dropdown with identical visual labels. The issue occurred in the api_autocomplete_extended method where duplicate detection was based on the label_target's ID and class name, which didn't account for different query matches (scientific vs common name) that rendered the same visual text. This fix improves the deduplication logic by: - Generating the actual rendered label text for each result - Using OTU ID + visual label text as the deduplication key - Ensuring each unique OTU+label combination appears only once Added comprehensive test coverage to verify that visual duplicates are properly filtered while maintaining the ability to search by both scientific and common names. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add test case that directly reproduces the original bug where common names and scientific names sharing substrings caused duplicates - Extract helper method to reduce code duplication in label extraction - Reorganize existing tests with clearer context descriptions - Add comments explaining what each test scenario covers - Improve test maintainability with better structure and naming The new tests specifically verify that searching for "ashton" returns only one result when it appears in both the scientific name (ashtoni) and common name (ashton cuckoo bumble bee), while still maintaining searchability by either name type independently. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
otu_label_pairs = results.map { |r| [r[:otu].id, extract_label(r)] } | ||
|
||
# Verify no duplicates exist | ||
expect(otu_label_pairs.uniq).to eq(otu_label_pairs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, where your synonym is a homonym, we do get duplicate labels in the UI, since the otus for the two related taxa have all of the same data used to build the autocomplete label: (otu name, taxon name, taxon name cached) and have different otu ids.
So there will still be duplicated results possible for otu autocomplete, but with your fix here we should no longer get duplicate results for the same otu.
The duplicates caused by homonyms is a separate issue I think with a different sort of solution - for now I think you can just remove this 'with synonyms relationship' context.
@@ -221,8 +221,18 @@ def api_autocomplete_extended | |||
compact = [] | |||
|
|||
r.each do |h| | |||
g = h[:label_target].id.to_s + h[:label_target].class.name | |||
m = [ h[:otu].id, g ] | |||
# Generate the actual rendered label to detect visual duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the issue, but let's move the solution to where we can actually create the current label, in app/views/otus/api/v1/autocomplete.json.jbuilder - that will keep us from getting out of sync with the display code.
See this API call for example (illustrated in TaxonPages screenshots above):
https://sfg.taxonworks.org/api/v1/otus/autocomplete?project_token=ekMTicbZWijqmdpHKqs_TA&having_taxon_name_only=true&include_common_names=true&term=ashton
And try it out here: https://ag.purdue.edu/department/entm/perc/search-collection.html
Fix duplicate OTU entries in autocomplete when scientific and common names share substrings
Previously, when searching for a substring that appeared in both an OTU's scientific name
and common name, the autocomplete would return duplicate entries for the same OTU. This
created a confusing user experience where the same OTU would appear multiple times in
the dropdown with identical visual labels.
The issue occurred in the api_autocomplete_extended method where duplicate detection was
based on the label_target's ID and class name, which didn't account for different query
matches (scientific vs common name) that rendered the same visual text.
This fix improves the deduplication logic by:
Added comprehensive test coverage to verify that visual duplicates are properly filtered
while maintaining the ability to search by both scientific and common names.
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]