-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
While writing tests for cluster to cluster streaming, we ran into a unique failure mode when running under race. In particular, once the stream ingestion job is "cutover", it attempts to RevertRange its target span to bring the cluster to a consistent state. RevertRange uses MVCCClearTimeRange if it is able to buffer a long enough (> 64 KVs) run of keys to clear. Under race, this call uses a spanSetBatch instead of a pebbleBatch as per this logic:
https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/replica_write.go#L692
In the case of a spanSetBatch there is an additional CheckAllowedRange() call in the stack, that ensures the RevertRange request span is valid before clearing the keys -
cockroach/pkg/kv/kvserver/spanset/spanset.go
Line 224 in d12d281
| func (s *SpanSet) CheckAllowed(access SpanAccess, span roachpb.Span) error { |
One of the checks in this method is to ensure that the EndKey in the RevertRange request is lexicographically before the StartKey
Line 2178 in d12d281
| func (s Span) Valid() bool { |
If the run of keys to be ClearRange'd by the RevertRangeRequest happen to be >64 versions of the same key, then the
StartKey and EndKey are equal, thereby violating this invariant when run under race.
Jira issue: CRDB-3153