-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Is your feature request related to a problem? Please describe.
I am trying to estimate maximum reachable OLTP performance for a client of mine. To my frustration I was not able to scale a CockroachDB cluster to significantly more than 256 nodes, due to high CPU load when adding more nodes (most of which is taken up by gossip-protocol related functions according to profiling). My measurements suggest that the work done for gossiping on each node scales quadratically in the number of nodes, which puts an upper limit on the maximum cluster size at about 1000 nodes.
Describe the solution you'd like
The gossip protocol should only perform linear work in the number of nodes.
Describe alternatives you've considered
The gossip protocol intervals could be configurable so larger clusters could be achievable by trading away DDL and node failure detection speed. However, this would only add a small factor to the maximum size until the quadratic growth would have pushed failure detection times too high.
Jira issue: CRDB-4006