Skip to content

kvserver: Under race we use a different batch that could cause RevertRange to fail #60710

@adityamaru

Description

@adityamaru

While writing tests for cluster to cluster streaming, we ran into a unique failure mode when running under race. In particular, once the stream ingestion job is "cutover", it attempts to RevertRange its target span to bring the cluster to a consistent state. RevertRange uses MVCCClearTimeRange if it is able to buffer a long enough (> 64 KVs) run of keys to clear. Under race, this call uses a spanSetBatch instead of a pebbleBatch as per this logic:
https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/replica_write.go#L692

In the case of a spanSetBatch there is an additional CheckAllowedRange() call in the stack, that ensures the RevertRange request span is valid before clearing the keys -

func (s *SpanSet) CheckAllowed(access SpanAccess, span roachpb.Span) error {

One of the checks in this method is to ensure that the EndKey in the RevertRange request is lexicographically before the StartKey

func (s Span) Valid() bool {

If the run of keys to be ClearRange'd by the RevertRangeRequest happen to be >64 versions of the same key, then the StartKey and EndKey are equal, thereby violating this invariant when run under race.

Jira issue: CRDB-3153

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-disaster-recoveryA-tenant-streamingIncluding cluster streamingC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions