feat: producer cancellation and panic safety #113

tustvold · 2022-03-02T18:52:51Z

Builds on and is intended to replace #112. This will also replace #111 as it reworks the BroadcastOnce to no longer use Shared as its peek implementation is just unhelpful.

As reported in https://github.com/influxdata/influxdb_iox/issues/3805 we've been observing occasional panics in production. I eventually tracked this down to be due to a lack of cancellation safety within produce, in particular if the write request times out the future will be cancelled (dropped), causing it to drop the lock and leave the aggregator in an inconsistent state.

chore: update to rust 1.59

tustvold · 2022-03-02T18:54:25Z

src/client/producer.rs

+    ) -> Result<OwnedMutexGuard<ProducerInner<A>>> {
+        debug!(?client, "Flushing batch producer");
+
+        // Spawn a task to provide cancellation safety


I recommend looking at the diff without whitespace https://github.com/influxdata/rskafka/pull/113/files?w=1

tustvold · 2022-03-02T18:59:27Z

Cargo.toml

 thiserror = "1.0"
 time = "0.3"
-tokio = { version = "1.14", default-features = false, features = ["io-util", "net", "rt", "sync", "time"] }
+tokio = { version = "1.14", default-features = false, features = ["io-util", "net", "rt", "sync", "time", "macros"] }


This is needed for tokio::select

tustvold · 2022-03-02T19:00:48Z

src/client/producer/broadcast.rs

+/// - Receivers can be created with [`BroadcastOnce::receiver`]
+/// - The value can be produced with [`BroadcastOnce::broadcast`]
+#[derive(Debug)]
+pub struct BroadcastOnce<T> {


As described in #111 Shared::peak has some quirks when used with interior mutable futures, these can be worked around, but in the interest of keeping things simple I just switch to using Notify and a shared Option

tustvold · 2022-03-02T19:01:38Z

src/client/producer.rs

+            // implementation will still signal the result slot preventing flushing twice
+            let slot = std::mem::take(&mut inner.result_slot);
+
+            let (output, status_deagg) = match inner.aggregator.flush() {


We still have an issue if the aggregator panics and leaves itself in an undefined state, but that's something for a future PR

tustvold · 2022-03-02T19:03:39Z

src/client/producer.rs

        compression: Compression,
-    ) {
-        trace!("Flushing batch producer");
+    ) -> Result<OwnedMutexGuard<ProducerInner<A>>> {


This dance passing a lock in and out of the future is a bit meh, but it is necessary for the correctness of the too large record detection - it needs to flush the aggregator, and then try to push the record again without allowing something else to push a new record in the intervening time.

crepererum

Only some small nitpicks. Good job! 💪

crepererum · 2022-03-03T08:55:24Z

src/client/producer.rs

+    /// The returned future is cancellation safe in that it won't leave the [`BatchProducer`]
+    /// in an inconsistent state, however, the provided data may or may not be produced


So if you have multiple produce call sharing one aggregated write, does canceling one of the produce calls have a side effect on the other calls? I THINK (from reading the code and our call yesterday) the answer is "no", but I wanted to double-check.

Correct, cancellation should not be able to side-effect on others anymore

src/client/producer.rs

src/client/producer/broadcast.rs

Co-authored-by: Marco Neumann <[email protected]>

tustvold added 4 commits March 1, 2022 16:44

feat: add more debug logging

f7eef85

chore: update to rust 1.59

chore: fix new lints in 1.59

7a4cdcc

feat: add result slot generation

ad65268

chore: fix doc

7c1ac78

tustvold changed the title ~~Cancel safety~~ feat: producer cancellation and panic safety Mar 2, 2022

tustvold force-pushed the cancel-safety branch from a26cc18 to 2e872b4 Compare March 2, 2022 18:59

feat: producer cancellation and panic safety

a4c1773

tustvold force-pushed the cancel-safety branch from 2e872b4 to a4c1773 Compare March 2, 2022 19:12

tustvold commented Mar 2, 2022

View reviewed changes

crepererum approved these changes Mar 3, 2022

View reviewed changes

tustvold and others added 2 commits March 3, 2022 10:26

chore: review feedback

477e96c

Co-authored-by: Marco Neumann <[email protected]>

chore: review feedback

34c3d7b

tustvold added the automerge Instruct kodiak to merge the PR label Mar 3, 2022

kodiakhq bot merged commit 39cd7d8 into main Mar 3, 2022

kodiakhq bot deleted the cancel-safety branch March 3, 2022 10:32

This was referenced Mar 3, 2022

fix: peek after broadcast #111

Closed

Frequent connection losses #103

Closed

crepererum mentioned this pull request Mar 7, 2022

Recreate producer aggregator instead of flushing it #114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: producer cancellation and panic safety #113

feat: producer cancellation and panic safety #113

Uh oh!

tustvold commented Mar 2, 2022

Uh oh!

tustvold Mar 2, 2022

Uh oh!

tustvold Mar 2, 2022

Uh oh!

tustvold Mar 2, 2022

Uh oh!

tustvold Mar 2, 2022

Uh oh!

tustvold Mar 2, 2022

Uh oh!

crepererum left a comment

Uh oh!

crepererum Mar 3, 2022

Uh oh!

tustvold Mar 3, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		/// The returned future is cancellation safe in that it won't leave the [`BatchProducer`]
		/// in an inconsistent state, however, the provided data may or may not be produced

feat: producer cancellation and panic safety #113

feat: producer cancellation and panic safety #113

Uh oh!

Conversation

tustvold commented Mar 2, 2022

Uh oh!

tustvold Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

tustvold Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

tustvold Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

tustvold Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

tustvold Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

crepererum left a comment

Choose a reason for hiding this comment

Uh oh!

crepererum Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

tustvold Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!