-
Notifications
You must be signed in to change notification settings - Fork 944
GracefulMasterTakeover logic flaw in semi-sync replication scenario #1508
Description
Consider the following topology:
,- replica1 (Rpl_semi_sync_slave_status=1)
,
,
master (
Rpl_semi_sync_master_status=1
rpl_semi_sync_master_wait_no_slave=1 - - replica2 (Rpl_semi_sync_slave_status=1)
rpl_semi_sync_master_wait_for_slave_count=2
)
`
`
`- replica3 (Rpl_semi_sync_slave_status=1)
Assume that replica1 is the new master. Based on the source code, Orchestrator first allows the new master replica1 to take over the old master’s siblings.
The topology then becomes:
,- replica1 (Rpl_semi_sync_slave_status=1)
master ( ,
Rpl_semi_sync_master_status=1
rpl_semi_sync_master_wait_no_slave=1 - replica1 (Rpl_semi_sync_slave_status=1)
rpl_semi_sync_master_wait_for_slave_count=2
) `
`- replica3 (Rpl_semi_sync_slave_status=1)
This presents a problem. Since the old master has rpl_semi_sync_master_wait_for_slave_count=2, but now only has one replica (replica1), all DML operations will be blocked while waiting for an ACK.
Next, Orchestrator will attempt to set read_only=1 on the old master, but since the DML operations are blocked (as mentioned), the set read_only=1 operation will also be blocked. If rpl_semi_sync_master_timeout is infinite, the switchover will hang indefinitely because ExecInstance does not have a timeout limit.
Even if rpl_semi_sync_master_timeout is not infinite, this situation will significantly increase switchover time, thus impacting the business even more.
In contrast, MHA’s switchover process avoids this issue because its process is as follows:
- Block writes on the old master.
- Wait for the new master to sync and remove the read-only restriction; at this point, the business can resume operations.
- Change all replicas of the old master(except the new master), to the new master.
- Finally, the old master change master to the new master.
I don’t understand why Orchestrator first lets the new master (replica1) take over the old master’s siblings. This approach introduces issues that MHA avoids.