gossip: abnormal CPU usage growth with increasing node count

**Is your feature request related to a problem? Please describe.**

I am trying to estimate maximum reachable OLTP performance for a client of mine. To my frustration I was not able to scale a CockroachDB cluster to significantly more than 256 nodes, due to high CPU load when adding more nodes (most of which is taken up by gossip-protocol related functions according to profiling). My measurements suggest that the work done for gossiping on each node scales quadratically in the number of nodes, which puts an upper limit on the maximum cluster size at about 1000 nodes.

**Describe the solution you'd like**

The gossip protocol should only perform linear work in the number of nodes.

**Describe alternatives you've considered**

The gossip protocol intervals could be configurable so larger clusters could be achievable by trading away DDL and node failure detection speed. However, this would only add a small factor to the maximum size until the quadratic growth would have pushed failure detection times too high.

Jira issue: CRDB-4006

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gossip: abnormal CPU usage growth with increasing node count #51838

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gossip: abnormal CPU usage growth with increasing node count #51838

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions