Skip to content

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Nov 8, 2024

Summary: The bug only happens for transaction db with 2pc. The main change is in MemTableList::TryInstallMemtableFlushResults. Before this fix, memtables_to_flush may not include all flushed memtables, and it causes the min_log_number for the flush to be incorrect. The code path for calculating min_log_number is MemTableList::TryInstallMemtableFlushResults() -> GetDBRecoveryEditForObsoletingMemTables() -> PrecomputeMinLogNumberToKeep2PC() -> FindMinPrepLogReferencedByMemTable(). Inside FindMinPrepLogReferencedByMemTable(), we need to exclude all memtables being flushed.

The PR also includes some documentation changes.

Test plan: added a new unit that fails before this change.

// TODO(myabandeh): Not sure how batch_count could be 0 here.
if (batch_count > 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed for better understanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems more than a rename. I think the number of "memtable batches" and flushed memtables can actually be different if min_write_buffer_number_to_merge != 1. I'm wondering if that option has been broken all along?

Copy link
Member Author

@cbi42 cbi42 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the variable is not named correctly since it was introduced: 5aaef91#diff-398125fa2b06616bca3e9aaeeacb38d0f251321cc15722e4b636d20d1dfb1d6a. The way it's used seem to be number of memtables flushed.

edit: TODO: the logging in

rocksdb/db/memtable_list.cc

Lines 767 to 770 in 1f0ccd9

while (batch_count-- > 0) {
ReadOnlyMemTable* m = current_->memlist_.back();
if (m->edit_.GetBlobFileAdditions().empty()) {
ROCKS_LOG_BUFFER(log_buffer,
is redundant when multiple memtables are flushed into a same file.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42 cbi42 requested a review from ltamasi November 8, 2024 20:36
// TODO(myabandeh): Not sure how batch_count could be 0 here.
if (batch_count > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems more than a rename. I think the number of "memtable batches" and flushed memtables can actually be different if min_write_buffer_number_to_merge != 1. I'm wondering if that option has been broken all along?

uint64_t PrecomputeMinLogContainingPrepSection(
const std::unordered_set<ReadOnlyMemTable*>* memtables_to_flush =
nullptr);
nullptr) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 merged this pull request in 925435b.

ybtsdst pushed a commit to ybtsdst/rocksdb that referenced this pull request Apr 27, 2025
Summary:
The bug only happens for transaction db with 2pc. The main change is in `MemTableList::TryInstallMemtableFlushResults`. Before this fix, `memtables_to_flush` may not include all flushed memtables, and it causes the min_log_number for the flush to be incorrect. The code path for calculating min_log_number is `MemTableList::TryInstallMemtableFlushResults() -> GetDBRecoveryEditForObsoletingMemTables() -> PrecomputeMinLogNumberToKeep2PC() -> FindMinPrepLogReferencedByMemTable()`. Inside `FindMinPrepLogReferencedByMemTable()`, we need to exclude all memtables being flushed.

The PR also includes some documentation changes.

Pull Request resolved: facebook#13127

Test Plan: added a new unit that fails before this change.

Reviewed By: ltamasi

Differential Revision: D65679270

Pulled By: cbi42

fbshipit-source-id: 611f34bd6ef4cba51f8b54cb1be416887b5a9c5e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants