(2.14) [FIXED] Filestore recovers from partial purge after hard kill #7676
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If the server was hard killed during the purge operation it could come up entirely empty. This was due to the purge operation moving the entire
msgsdirectory to__msgs__and only after creating the file with the required tombstone, a hard kill between these operations would see the stream fully reset back to zero.This PR proposes an intermediate step and re-ordering of operations:
__new_msgs__.msgsdirectory to__msgs__, and remove__msgs__asynchronously (this was already the case).__new_msgs__directory back tomsgs.The server will then properly recover in all cases and never roll back the stream sequence:
__new_msgs__tomsgsfails, we retry that on restart.__msgs__purge directory still exists after restart, it's removed (this was already the case).__new_msgs__directory was created, but the tombstone file was not moved prior to the hard kill. Then the__new_msgs__directory is removed on restart. The data is still preserved, except for one additional tombstone existing. Retrying the purge command will clean this up.There was no error-handling of disk operations so this PR does not introduce that, that should be done separately.
If this PR is merged, we should probably also include a compatibility commit for 2.12. Otherwise, a hard kill on this new version that creates a
__new_msgs__directory with a tombstone, followed by a downgrade to 2.12 and more purge operations, and then a restart back to 2.14 would see the stream revert back to the sequences of that initial partial purge (since 2.14 would recognize that as needing to remove themsgsdirectory and replace it with__new_msgs__). The compatibility commit would purely be to remove the__new_msgs__directory when seen, just like the purge__msgs__directory.Signed-off-by: Maurice van Veen [email protected]