rpc: clusterID check can fail to prevent two clusters from interacting

**Describe the problem**

During the test of cockroach, the  raft appeared panic problem, the error log is as follows
```Tocommit(61) is out of range [lastIndex(59)]. Was the raft log corrupted, truncated, or lost?```
After careful investigation, I found that raft logs from other cluster were sent to the localhost:28253 process. There is  a disorder in sending messages between different clusters.

**To Reproduce**

The steps to reproduce are as follows:
    1,
    Start the three nodes of cluster1 node1, node2, node3
    Node1->localhost:28251
    Node2->localhost:28252
    Node3->localhost:28253
    create table in cluster1's node1, keeps inserting data
    2,
    Kill cluster1's node3 localhost:28253 process, clear node3's data
    Keep cluster1's node1 inserting data
    3,
    Start the three nodes of cluster2 node1, node2, node3
    Node1->localhost:28254
    Node2->localhost:28255
    Node3->localhost:28253

    At this time, the node3 (localhost:28253) IP port seen by cluster1 has not changed.  It is considered that node3 is still the node3 of cluster1 (actually node3 is cluster2 at this time),
    cluster1's nodes will send grpc heartbeat to node3, node3 receives cluster1 grpc heartbeat after detecting clusterID inconsistency and return error to cluster1's nodes, then cluster1
    Think node3's connection (to localhost:28253) is unhealthy, no, cluster1's nodes  will not synchronize raft logs to node3.

    However, the instantaneous clusterID that node3 just started has not been obtained yet.
    https://github.com/cockroachdb/cockroach/blob/master/pkg/rpc/heartbeat.go#L96
 At this time, the correct PingResponse will be returned to cluster1. After the cluster1 receives the feedback, cluster1's raft log will be send to node3.
    Leading to the raft process panic of node3


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rpc: clusterID check can fail to prevent two clusters from interacting #37907

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rpc: clusterID check can fail to prevent two clusters from interacting #37907

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions