Bigger bloom filter size #164
Replies: 6 comments
-
The filter size is stored in 1 byte, that is why the filter size cannot be more than 255, which is a reasonable Bloom filter size for the use cases we want to address with stream filtering. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
To add to that, the choice of a single byte field was by design. |
Beta Was this translation helpful? Give feedback.
-
It is not "totally exotic" but most users with fleets of devices use MQTT with queues. We don't have sufficient evidence for much interest in larger cardinality for stream filtering. |
Beta Was this translation helpful? Give feedback.
-
@AntonSmolkov going from what Osiris has today to millions of "microstreams" within a stream would require 24 or 32 bytes for the Bloom filter. That's a heck of a jump to consider it to be a no-brainer, and there are only 4 bytes available in the chunk header right now. You probably should stick to MQTT with queues (see above) for now. |
Beta Was this translation helpful? Give feedback.
-
Well, given that consumers(applications) would subscribe to only tens of thousands of these "microstreams", these 1 million microstreams can be distributed to 10 (or more) streams using consistent hashing. Using one extra byte seems to be sufficient under these circumstances. (100k entries per filter with error rate of 10%, as calculated above)
I'm concerned that regular queues might not be able to handle such a high ingest rate. In my scenario, the data from the sensors is already collected in a massive Kafka topic; the objective is to distribute subsets of data (sensors) to dozens of consumers (applications) with minimal traffic and storage overhead. But i'll explore this option further, thank you.
Let this enhancement proposal serve as the first piece of evidence :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there!
Now, the stream's Bloom filter size is limited to only 255 bytes. The reason for this limitation is detailed in the
corresponding article:
Additionally, the .NET client library documentation mentions:
So, why not allow setting any arbitrary integer for the Bloom filter size? Are there any technical restrictions to consider?
Describe the solution you'd like
I'd like to have the ability to set any arbitrary limit to the bloom filter size.
Maybe this ability must be enabled additioanly via feature flag beforehand.
Describe alternatives you've considered
Additional context
Implementation of the requested feature whould make RabbitMQ Streams more competitive over NATS JetStreams, which consumer's have FilterSubjects property
Filter size for 25k elemets with 20% error rate

Beta Was this translation helpful? Give feedback.
All reactions