Skip to content

kvprober reports errors on a decommissioned node #104367

@joshimhoff

Description

@joshimhoff

Describe the problem

Today, kvprober runs on a decommissioned node. In CC, this is generally fine, since automation fully takes down nodes once they reach the decommissioned state. But there is a brief period where a node is running and in the decommissioned state, and we saw kvprober errors during this period. This sometimes leads to kvprober errors like what is below:

‹rpc error: code = PermissionDenied desc = n1 was permanently removed from...

To be clear, the errors are not wrong. They just are expected to happen, once a node is decommissioned. Tho they are not wrong per say, we will call them a bug, since they can lead to alerting false positive.

To Reproduce

See test at #104365.

Behavior we want

No errors when a node enters the decommissioned state.

Jira issue: CRDB-28490

Metadata

Metadata

Assignees

Labels

A-kvAnything in KV that doesn't belong in a more specific category.A-kv-observabilityC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-sreFor issues SRE opened or otherwise cares about tracking.v23.1.9

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions