Skip to content

Conversation

@pav-kv
Copy link
Contributor

@pav-kv pav-kv commented Mar 8, 2023

There is a potentially unlimited memory usage spike possible in raft.hup() which reads all unapplied committed entries in order to check that there are no unapplied config changes. This PR paginates this scan so that the spike is limited to MaxCommittedSizePerReady. It also terminates the scan early if a config change has been found.

An example effect of this change can be seen below (memory usage in CockroachDB). Before, there was a large spike to 12 GiB:

image

After, the scan was smeared out, and the memory usage is stable around 1 GiB:

image

@pav-kv pav-kv marked this pull request as ready for review March 8, 2023 14:38
@pav-kv
Copy link
Contributor Author

pav-kv commented Mar 8, 2023

@tbg @ahrtr PTAL

Copy link
Contributor

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is what I expected, I'm a bit surprised to see the discussion on it. Perhaps the two of you can clue me in on what the discussion is about, I was mostly confused reading it, so am likely missing something.

@ahrtr
Copy link
Member

ahrtr commented Jun 7, 2023

The PR looks good to me. Can you rebase this PR although there is no conflict? There are huge differences between this PR and the main branch.

@pav-kv
Copy link
Contributor Author

pav-kv commented Jun 7, 2023

@ahrtr Sure, will rebase. Please don't merge yet, there is one suggested refactoring from @tbg to address.

@pav-kv pav-kv force-pushed the rm-unlimited-read branch from 26173de to 2751060 Compare June 7, 2023 08:23
@pav-kv pav-kv requested review from ahrtr and tbg June 7, 2023 08:28
@pav-kv
Copy link
Contributor Author

pav-kv commented Jun 7, 2023

@ahrtr Rebased.
@tbg Added the scan method to raftLog, per your suggestion.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Nice improvement! thx @pavelkalinnikov

@pav-kv pav-kv force-pushed the rm-unlimited-read branch from 2751060 to 1df7629 Compare June 7, 2023 10:57
@tbg tbg merged commit 515b142 into etcd-io:main Jun 7, 2023
@pav-kv pav-kv deleted the rm-unlimited-read branch June 7, 2023 11:32
pav-kv added a commit to pav-kv/cockroach that referenced this pull request Jun 7, 2023
This includes etcd-io/raft#32 which fixes an
out-of-memory scenario during election with large unapplied Raft logs.

Part of cockroachdb#104402
Epic: none
Release note (bug fix): Fixed a bug in upstream etcd-io/raft which could result
in pulling unlimited amount of log into memory, and lead to out-of-memory
situations. With the fix, the log scan has a limited memory footprint.
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Jun 8, 2023
104483: go.mod: upgrade etcd-io/raft to 515b142 r=tbg a=pavelkalinnikov

This includes etcd-io/raft#32 which fixes an out-of-memory scenario during election with large unapplied Raft logs.

Part of #104402
Epic: none
Release note (bug fix): Fixed a bug in upstream etcd-io/raft which could result in pulling unlimited amount of log into memory, and lead to out-of-memory situations. With the fix, the log scan has a limited memory footprint.

Co-authored-by: Pavel Kalinnikov <[email protected]>
pav-kv added a commit to cockroachdb/etcd that referenced this pull request Jun 15, 2023
This commit patches etcd-io/raft#32 which fixes an out-of-memory scenario
during election with large unapplied Raft logs.

The patch does apply and build cleanly, so there is a fixup commit on top.

Commits included:
313e96ce    raft: factor out unapplied config changes check
a22dbb1e    raft: don't scan unapplied entries if applied==committed
2cb76edb    raft: inline numOfPendingConf() != 0 check
d980ecd4    raft: paginate the unapplied config changes scan
1df76294    raft: add raftLog.scan method for paginated reads

Signed-off-by: Pavel Kalinnikov <[email protected]>
pav-kv added a commit to cockroachdb/etcd that referenced this pull request Jun 15, 2023
This commit fixes the build after porting etcd-io/raft#32.

Signed-off-by: Pavel Kalinnikov <[email protected]>
pav-kv added a commit to cockroachdb/raft that referenced this pull request Jun 15, 2023
…-read

raft: paginate the unapplied config changes scan

There is a potentially unlimited memory usage spike possible in raft.hup()
which reads all unapplied committed entries in order to check that there are no
unapplied config changes. This PR paginates this scan so that the spike is
limited to MaxCommittedSizePerReady. It also terminates the scan early if a
config change has been found.

Signed-off-by: Pavel Kalinnikov <[email protected]>
pav-kv added a commit to pav-kv/cockroach that referenced this pull request Jun 15, 2023
This dependency upgrade includes a patched PR 32 from etcd-io/raft which fixes
an out-of-memory scenario during election with large unapplied Raft logs.

Part of cockroachdb#104402
Epic: none
Release note (bug fix): Fixed a bug in upstream etcd-io/raft which could result
in pulling unlimited amount of log into memory, and lead to out-of-memory
situations. With the fix, the log scan has a limited memory footprint.
Release justification: fixing a bug after a customer escalation
overvenus added a commit to tikv/raft-rs that referenced this pull request Oct 15, 2023
Fix potentially unlimited memory usage spike possible in raft.hup() which reads
all unapplied committed entries in order to check that there are no unapplied
config changes. This PR paginates this scan so that the spike is limited to
MaxCommittedSizePerReady. It also terminates the scan early if a config change
has been found.

It is ported from etcd-io/raft#32.

Signed-off-by: Neil Shen <[email protected]>
overvenus added a commit to tikv/raft-rs that referenced this pull request Oct 17, 2023
Fix potentially unlimited memory usage spike possible in raft.hup() which reads
all unapplied committed entries in order to check that there are no unapplied
config changes. This PR paginates this scan so that the spike is limited to
MaxCommittedSizePerReady. It also terminates the scan early if a config change
has been found.

It is ported from etcd-io/raft#32.

Signed-off-by: Neil Shen <[email protected]>
tisonkun added a commit to tisonkun/mephisto-archive that referenced this pull request Oct 18, 2023
Fix potentially unlimited memory usage spike possible in raft.hup() which reads
all unapplied committed entries in order to check that there are no unapplied
config changes. This PR paginates this scan so that the spike is limited to
MaxCommittedSizePerReady. It also terminates the scan early if a config change
has been found.

It is ported from etcd-io/raft#32.
It is ported from tikv/raft-rs#530.

Signed-off-by: tison <[email protected]>
tisonkun added a commit to tisonkun/mephisto-archive that referenced this pull request Oct 18, 2023
Fix potentially unlimited memory usage spike possible in raft.hup()
which reads all unapplied committed entries in order to check that there
are no unapplied config changes. This PR paginates this scan so that the
spike is limited to MaxCommittedSizePerReady. It also terminates the
scan early if a config change has been found.

It is ported from etcd-io/raft#32.

It is ported from tikv/raft-rs#530.

Signed-off-by: tison <[email protected]>
@ahrtr ahrtr added this to the v3.6.0 milestone Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants