Skip to content

Conversation

crepererum
Copy link
Collaborator

This is required to provide a race-free way to start at the earliest or
latest offset, because fetching the offset and then creating a stream
might fail because in the meantime the Kafka retention policy might
already have removed the offsets in question.

@crepererum crepererum requested a review from tustvold February 15, 2022 16:53
@crepererum crepererum force-pushed the crepererum/consumer_stream_earliest_latest branch from b8875e3 to 14e701b Compare February 15, 2022 16:54
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, had some minor comments

// Remember used offset (might be overwritten if there was any data) so we don't refetch the
// earliest / latest offset for every try. Also fetching the latest offset might be racy otherwise,
// since we'll never be in a position where the latest one can actually be fetched.
*this.next_offset = Some(used_offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this, I'm not sure under what circumstance Kafka would return no records, but if this did occur, wouldn't we just get stuck in an infinite loop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue here is that "latest" offset: If you have a new or empty partition, then Kafka returns the start of that empty interval (e.g. 0 if it's an empty partition) if you ask it, also see

rskafka/tests/client.rs

Lines 241 to 251 in 15f424f

assert_eq!(
partition_client
.get_offset(OffsetAt::Earliest)
.await
.unwrap(),
0
);
assert_eq!(
partition_client.get_offset(OffsetAt::Latest).await.unwrap(),
0
);

Now if you don't remember that offset, the time your produce something to the partition the "latest" offset (which is the latest record + 1) will also increase, so you're always chasing the tail of the partition but never fetch any records. So when you first get an answer to "what's the earliest/latest offset?" you need to remember that (except when this turns out to be wrong due to "offset out of range").

When the partition stays empty, then sure this loops forever and asks Kafka again and again for data. However this is not a hot loop since the fetch request has a timeout attached, so the broker will wait some milliseconds until it tells us "there are no records here, like the last X time you've asked me".

stream: mpsc::Receiver<Record>,
next_err: Option<Error>,
buffer: Vec<Record>,
range: (i64, i64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interaction with the offset tracking in buffer is perhaps a little funky. Perhaps this might be easier to understand if these were tracked automatically, and we added a delete records API... I dunno 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean you wanna mock a full Kafka broker? 😉

crepererum and others added 4 commits February 16, 2022 12:38
This is required to provide a race-free way to start at the earliest or
latest offset, because fetching the offset and then creating a stream
might fail because in the meantime the Kafka retention policy might
already have removed the offsets in question.
Co-authored-by: Raphael Taylor-Davies <[email protected]>
@crepererum crepererum force-pushed the crepererum/consumer_stream_earliest_latest branch from 1dc9501 to 36e89b2 Compare February 16, 2022 11:38
@crepererum crepererum force-pushed the crepererum/consumer_stream_earliest_latest branch from 36e89b2 to 51ac55a Compare February 16, 2022 11:40
@crepererum crepererum added the automerge Instruct kodiak to merge the PR label Feb 16, 2022
@kodiakhq kodiakhq bot merged commit f77606e into main Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Instruct kodiak to merge the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants