Skip to content

Conversation

@twmb
Copy link
Owner

@twmb twmb commented Dec 4, 2025

See doc.

Closes #1190.

travisdowns added a commit to travisdowns/redpanda that referenced this pull request Dec 4, 2025
Tests are flaking flake when kgo-repeater dies with a
non-retriable error right after connection. This happens when franz-go
gets EOF right after connecting but before receiving any response, as
a heuristic it assumes this is a SASL misconfiguration as that is the
broker behavior in that case. However, this can also occur because
Redpanda is stopped/killed after the connection is made but before
the initial requests can be responded to.

This means the producer will fail if a producer dies/is killed/stops
during a critical window between the connection and receiving the first
response. This is reasonably likely in stress/chaos tests where
producers are being started and stopped all the time.

This is a relatively recent change (~6 months ago) in franz-go,
which was brought in a few months ago by a franz-go upgrade.

To fix this, we make use a proposed new option to franz-go, from this
PR:

twmb/franz-go#1198

This is not merged, so we pull in the SHA from this PR directly. When
a franz-go release is made with this change, we can update to that.

Details in CORE-14849.

Fixes CORE-14898.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add AlwaysRetryEOF configuration option

3 participants