Skip to content

Conversation

@jmr
Copy link
Contributor

@jmr jmr commented Jun 2, 2020

Process the BitVector unit-by-unit instead of bit-by-bit.

Use PopCount::count() to update num_1s and use select_bit to find
the bit positions for the select0s_ and select1s_ indexes.

According to my benchmarks, the old bit-by-bit version processed a
256kbit vector at about 20MB/s independent of enables_select0 and
enables_select1.

The new version is 50x-150x faster, depending on the compiler and build_index
options.

enables_select_0=enables_select1=false:
popcnt, no bmi2: 1600MB/s
popcnt, no bmi2: 2500MB/s
popcnt and bmi2: 2900MB/s

enables_select_0=enables_select1=true:
no popcnt, no bmi2: 1100MB/s
popcnt, no bmi2: 1600MB/s
popcnt and bmi2: 1800MB/s

Process the BitVector unit-by-unit instead of bit-by-bit.

Use PopCount::count() to update num_1s and use select_bit to find
the bit positions for the select0s_ and select1s_ index.

According to my benchmarks, the old bit-by-bit version processed a
256kbit vector at about 20MB/s independent of enables_select0/enables_select1.

The new version is 50x-150x faster, depending on the compiler and build_index
options.

enables_select_0=enables_select1=false:
popcnt, no bmi2: 1600MB/s
popcnt, no bmi2: 2500MB/s
popcnt and bmi2: 2900MB/s

enables_select_0=enables_select1=true:
no popcnt, no bmi2: 1100MB/s
popcnt, no bmi2: 1600MB/s
popcnt and bmi2: 1800MB/s
jmr added 2 commits June 3, 2020 08:28
The 32-bit select_bit will be used to make the new build_index
implementation work for MARISA_WORD_SIZE == 32.
This is already used by build_index and fixes the 32-bit build.
@s-yata s-yata self-assigned this Jun 15, 2020
@s-yata
Copy link
Owner

s-yata commented Jun 16, 2020

Benchmark

The following table shows build speed [1,000 keys/second].

#tries s-yata:master [K/s] jmr:build-index [K/s]
1 1,054.64 1,087.85
2 915.17 937.07
3 901.00 920.59
4 896.08 914.57
5 894.30 912.47

jmr:build-index is 2-3% faster than s-yata:master.

@jmr
Copy link
Contributor Author

jmr commented Jun 16, 2020

jmr:build-index is 2-3% faster than s-yata:master.

Did you configure with --enable-native-code? popcnt and select_bit are going to be important.

My benchmark was just on BitVector::build_index. I don't know what fraction of marisa-benchmark is spent in build_index, so I can't say whether more than 2-3% is expected.

I will have time to run/profile the benchmarks myself later in the week.

@s-yata
Copy link
Owner

s-yata commented Jun 16, 2020

The table shows the speed of dictionary construction and BitVector::build_index is not a major part of it.
However, I think the improvement is enough to accept this pull request.

@s-yata s-yata merged commit 0873e86 into s-yata:master Jun 17, 2020
@s-yata
Copy link
Owner

s-yata commented Jun 17, 2020

It looks good tome.
Thank you!

@jmr jmr deleted the build-index branch May 20, 2025 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants