release-2.1: gossip: avoid removing nodes that get a new address #34198
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #34155.
/cc @cockroachdb/release
Fixes #34120.
K8s deployments make it possible for a node to get restarted using an
address previously attributed to another node, while the other node
is still alive (for example, a re-shuffling of node addresses during
a rolling restart).
Prior to this patch, the gossip code was assuming that if a node
starts with an address previously attributed to another node, that
other node must be dead, and thus was (incorrectly) erasing that
node's entry, thereby removing it from the cluster.
This scenario can be reproduced like this:
Prior to this patch, this scenario would yield "n4 removed from the
cluster" in other nodes, and n3 was not restarting properly. With the
patch, there is a period of time (until
server.time_until_store_dead) during which Raft is confused to notfind n4 at n3's address, but where the cluster otherwise operates
normally. After the store time outs, n4 is properly marked as down and
the log spam stops.
Release note (bug fix): CockroachDB now enables re-starting a node at
an address previously allocated for another node.