Skip to content

Conversation

@MauriceVanVeen
Copy link
Member

If the server was hard killed during the purge operation it could come up entirely empty. This was due to the purge operation moving the entire msgs directory to __msgs__ and only after creating the file with the required tombstone, a hard kill between these operations would see the stream fully reset back to zero.

This PR proposes an intermediate step and re-ordering of operations:

  • Create the message block with the tombstone first.
  • Move the block with the tombstone out to a new directory, __new_msgs__.
  • Rename the msgs directory to __msgs__, and remove __msgs__ asynchronously (this was already the case).
  • Rename the __new_msgs__ directory back to msgs.

The server will then properly recover in all cases and never roll back the stream sequence:

  • If the rename from __new_msgs__ to msgs fails, we retry that on restart.
  • If the __msgs__ purge directory still exists after restart, it's removed (this was already the case).
  • If the __new_msgs__ directory was created, but the tombstone file was not moved prior to the hard kill. Then the __new_msgs__ directory is removed on restart. The data is still preserved, except for one additional tombstone existing. Retrying the purge command will clean this up.

There was no error-handling of disk operations so this PR does not introduce that, that should be done separately.

If this PR is merged, we should probably also include a compatibility commit for 2.12. Otherwise, a hard kill on this new version that creates a __new_msgs__ directory with a tombstone, followed by a downgrade to 2.12 and more purge operations, and then a restart back to 2.14 would see the stream revert back to the sequences of that initial partial purge (since 2.14 would recognize that as needing to remove the msgs directory and replace it with __new_msgs__). The compatibility commit would purely be to remove the __new_msgs__ directory when seen, just like the purge __msgs__ directory.

Signed-off-by: Maurice van Veen [email protected]

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner December 23, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants