16 Jun 11:52

edenhill

b47da0e

v1.9.0

librdkafka v1.9.0

librdkafka v1.9.0 is a feature release:

Added KIP-768 OUATHBEARER OIDC support (by @jliunyu, #3560)
Added KIP-140 Admin API ACL support (by @emasab, #2676)

Upgrade considerations

Consumer:
rd_kafka_offsets_store() (et.al) will now return an error for any
partition that is not currently assigned (through rd_kafka_*assign()).
This prevents a race condition where an application would store offsets
after the assigned partitions had been revoked (which resets the stored
offset), that could cause these old stored offsets to be committed later
when the same partitions were assigned to this consumer again - effectively
overwriting any committed offsets by any consumers that were assigned the
same partitions previously. This would typically result in the offsets
rewinding and messages to be reprocessed.
As an extra effort to avoid this situation the stored offset is now
also reset when partitions are assigned (through rd_kafka_*assign()).
Applications that explicitly call ..offset*_store() will now need
to handle the case where RD_KAFKA_RESP_ERR__STATE is returned
in the per-partition .err field - meaning the partition is no longer
assigned to this consumer and the offset could not be stored for commit.

Enhancements

Improved producer queue scheduling. Fixes the performance regression
introduced in v1.7.0 for some produce patterns. (#3538, #2912)
Windows: Added native Win32 IO/Queue scheduling. This removes the
internal TCP loopback connections that were previously used for timely
queue wakeups.
Added socket.connection.setup.timeout.ms (default 30s).
The maximum time allowed for broker connection setups (TCP connection as
well as SSL and SASL handshakes) is now limited to this value.
This fixes the issue with stalled broker connections in the case of network
or load balancer problems.
The Java clients has an exponential backoff to this timeout which is
limited by socket.connection.setup.timeout.max.ms - this was not
implemented in librdkafka due to differences in connection handling and
ERR__ALL_BROKERS_DOWN error reporting. Having a lower initial connection
setup timeout and then increase the timeout for the next attempt would
yield possibly false-positive ERR__ALL_BROKERS_DOWN too early.
SASL OAUTHBEARER refresh callbacks can now be scheduled for execution
on librdkafka's background thread. This solves the problem where an
application has a custom SASL OAUTHBEARER refresh callback and thus needs to
call rd_kafka_poll() (et.al.) at least once to trigger the
refresh callback before being able to connect to brokers.
With the new rd_kafka_conf_enable_sasl_queue() configuration API and
rd_kafka_sasl_background_callbacks_enable() the refresh callbacks
can now be triggered automatically on the librdkafka background thread.
rd_kafka_queue_get_background() now creates the background thread
if not already created.
Added rd_kafka_consumer_close_queue() and rd_kafka_consumer_closed().
This allow applications and language bindings to implement asynchronous
consumer close.
Bundled zlib upgraded to version 1.2.12.
Bundled OpenSSL upgraded to 1.1.1n.
Added test.mock.broker.rtt to simulate RTT/latency for mock brokers.

Fixes

General fixes

Fix various 1 second delays due to internal broker threads blocking on IO
even though there are events to handle.
These delays could be seen randomly in any of the non produce/consume
request APIs, such as commit_transaction(), list_groups(), etc.
Windows: some applications would crash with an error message like
no OPENSSL_Applink() written to the console if ssl.keystore.location
was configured.
This regression was introduced in v1.8.0 due to use of vcpkgs and how
keystore file was read. #3554.
Windows 32-bit only: 64-bit atomic reads were in fact not atomic and could
in rare circumstances yield incorrect values.
One manifestation of this issue was the max.poll.interval.ms consumer
timer expiring even though the application was polling according to profile.
Fixed by @WhiteWind (#3815).
rd_kafka_clusterid() would previously fail with timeout if
called on cluster with no visible topics (#3620).
The clusterid is now returned as soon as metadata has been retrieved.
Fix hang in rd_kafka_list_groups() if there are no available brokers
to connect to (#3705).
Millisecond timeouts (timeout_ms) in various APIs, such as rd_kafka_poll(),
was limited to roughly 36 hours before wrapping. (#3034)
If a metadata request triggered by rd_kafka_metadata() or consumer group rebalancing
encountered a non-retriable error it would not be propagated to the caller and thus
cause a stall or timeout, this has now been fixed. (@aiquestion, #3625)
AdminAPI DeleteGroups() and DeleteConsumerGroupOffsets():
if the given coordinator connection was not up by the time these calls were
initiated and the first connection attempt failed then no further connection
attempts were performed, ulimately leading to the calls timing out.
This is now fixed by keep retrying to connect to the group coordinator
until the connection is successful or the call times out.
Additionally, the coordinator will be now re-queried once per second until
the coordinator comes up or the call times out, to detect change in
coordinators.
Mock cluster rd_kafka_mock_broker_set_down() would previously
accept and then disconnect new connections, it now refuses new connections.

Consumer fixes

rd_kafka_offsets_store() (et.al) will now return an error for any
partition that is not currently assigned (through rd_kafka_*assign()).
See Upgrade considerations above for more information.
rd_kafka_*assign() will now reset/clear the stored offset.
See Upgrade considerations above for more information.
seek() followed by pause() would overwrite the seeked offset when
later calling resume(). This is now fixed. (#3471).
Note: Avoid storing offsets (offsets_store()) after calling
seek() as this may later interfere with resuming a paused partition,
instead store offsets prior to calling seek.
A ERR_MSG_SIZE_TOO_LARGE consumer error would previously be raised
if the consumer received a maximum sized FetchResponse only containing
(transaction) aborted messages with no control messages. The fetching did
not stop, but some applications would terminate upon receiving this error.
No error is now raised in this case. (#2993)
Thanks to @jacobmikesell for providing an application to reproduce the
issue.
The consumer no longer backs off the next fetch request (default 500ms) when
the parsed fetch response is truncated (which is a valid case).
This should speed up the message fetch rate in case of maximum sized
fetch responses.
Fix consumer crash (assert: rkbuf->rkbuf_rkb) when parsing
malformed JoinGroupResponse consumer group metadata state.
Fix crash (cant handle op type) when using consume_batch_queue() (et.al)
and an OAUTHBEARER refresh callback was set.
The callback is now triggered by the consume call. (#3263)
Fix partition.assignment.strategy ordering when multiple strategies are configured.
If there is more than one eligible strategy, preference is determined by the
configured order of strategies. The partitions are assigned to group members according
to the strategy order preference now. (#3818)
Any form of unassign*() (absolute or incremental) is now allowed during
consumer close rebalancing and they're all treated as absolute unassigns.
(@kevinconaway)

Transactional producer fixes

Fix message loss in idempotent/transactional producer.
A corner case has been identified that may cause idempotent/transactional
messages to be lost despite being reported as successfully delivered:
During cluster instability a restarting broker may report existing topics
as non-existent for some time before it is able to acquire up to date
cluster and topic metadata.
If an idempotent/transactional producer updates its topic metadata cache
from such a broker the producer will consider the topic to be removed from
the cluster and thus remove its local partition objects for the given topic.
This also removes the internal message sequence number counter for the given
partitions.
If the producer later receives proper topic metadata for the cluster the
previously "removed" topics will be rediscovered and new partition objects
will be created in the producer. These new partition objects, with no
knowledge of previous incarnations, would start counting partition messages
at zero again.
If new messages were produced for these partitions by the same producer
instance, the same message sequence numbers would be sent to the broker.
If the broker still maintains state for the producer's PID and Epoch it could
deem that these messages with reused sequence numbers had already been
written to the log and treat them as legit duplicates.
This would seem to the producer that these new messages were successfully
written to the partition log by the broker when they were in fact discarded
as duplicates, leading to silent message loss.
The fix included in this release is to save the per-partition idempotency
state when a partition is removed, and then recover and use that saved
state if the partition comes back at a later time.
The transactional producer would retry (re)initializing its PID if a
PRODUCER_FENCED error was returned from the
broker (added in Apache K...

Contributors

WhiteWind, aiquestion, and 5 other contributors

Assets 2

25 Nov 08:21

edenhill

v1.6.2

4483be3

v1.6.2

librdkafka v1.6.2

librdkafka v1.6.2 is a maintenance release with the following backported fixes:

Upon quick repeated leader changes the transactional producer could receive
an OUT_OF_ORDER_SEQUENCE error from the broker, which triggered an
Epoch bump on the producer resulting in an InitProducerIdRequest being sent
to the transaction coordinator in the middle of a transaction.
This request would start a new transaction on the coordinator, but the
producer would still think (erroneously) it was in the current transaction.
Any messages produced in the current transaction prior to this event would
be silently lost when the application committed the transaction, leading
to message loss.
To avoid message loss a fatal error is now raised.
This fix is specific to v1.6.x. librdkafka v1.8.x implements a recoverable
error state instead. #3575.
The transactional producer could stall during a transaction if the transaction
coordinator changed while adding offsets to the transaction (send_offsets_to_transaction()).
This stall lasted until the coordinator connection went down, the
transaction timed out, transaction was aborted, or messages were produced
to a new partition, whichever came first. #3571.
librdkafka's internal timers would not start if the timeout was set to 0,
which would result in some timeout operations not being enforced correctly,
e.g., the transactional producer API timeouts.
These timers are now started with a timeout of 1 microsecond.
Force address resolution if the broker epoch changes (#3238).

Checksums

Release asset checksums:

v1.6.2.zip SHA256 1d389a98bda374483a7b08ff5ff39708f5a923e5add88b80b71b078cb2d0c92e
v1.6.2.tar.gz SHA256 b9be26c632265a7db2fdd5ab439f2583d14be08ab44dc2e33138323af60c39db

Assets 2

18 Oct 11:30

edenhill

v1.8.2

063a9ae

v1.8.2

librdkafka v1.8.2

librdkafka v1.8.2 is a maintenance release.

Enhancements

Added ssl.ca.pem to add CA certificate by PEM string. (#2380)
Prebuilt binaries for Mac OSX now contain statically linked OpenSSL v1.1.1l.
Previously the OpenSSL version was either v1.1.1 or v1.0.2 depending on
build type.

Fixes

The librdkafka.redist 1.8.0 package had two flaws:
- the linux-arm64 .so build was a linux-x64 build.
- the included Windows MSVC 140 runtimes for x64 were infact x86.
  The release script has been updated to verify the architectures of
  provided artifacts to avoid this happening in the future.
Prebuilt binaries for Mac OSX Sierra (10.12) and older are no longer provided.
This affects confluent-kafka-go.
Some of the prebuilt binaries for Linux were built on Ubuntu 14.04,
these builds are now performed on Ubuntu 16.04 instead.
This may affect users on ancient Linux distributions.
It was not possible to configure ssl.ca.location on OSX, the property
would automatically revert back to probe (default value).
This regression was introduced in v1.8.0. (#3566)
librdkafka's internal timers would not start if the timeout was set to 0,
which would result in some timeout operations not being enforced correctly,
e.g., the transactional producer API timeouts.
These timers are now started with a timeout of 1 microsecond.

Transactional producer fixes

Upon quick repeated leader changes the transactional producer could receive
an OUT_OF_ORDER_SEQUENCE error from the broker, which triggered an
Epoch bump on the producer resulting in an InitProducerIdRequest being sent
to the transaction coordinator in the middle of a transaction.
This request would start a new transaction on the coordinator, but the
producer would still think (erroneously) it was in current transaction.
Any messages produced in the current transaction prior to this event would
be silently lost when the application committed the transaction, leading
to message loss.
This has been fixed by setting the Abortable transaction error state
in the producer. #3575.
The transactional producer could stall during a transaction if the transaction
coordinator changed while adding offsets to the transaction (send_offsets_to_transaction()).
This stall lasted until the coordinator connection went down, the
transaction timed out, transaction was aborted, or messages were produced
to a new partition, whichever came first. #3571.

Checksums

Release asset checksums:

v1.8.2.zip SHA256 8b03d8b650f102f3a6a6cff6eedc29b9e2f68df9ba7e3c0f3fb00838cce794b8
v1.8.2.tar.gz SHA256 6a747d293a7a4613bd2897e28e8791476fbe1ae7361f2530a876e0fd483482a6

Note: there was no v1.8.1 librdkafka release

Assets 2

16 Sep 14:30

edenhill

v1.8.0

9ded5ee

v1.8.0

librdkafka v1.8.0

librdkafka v1.8.0 is a security release:

Upgrade bundled zlib version from 1.2.8 to 1.2.11 in the librdkafka.redist
NuGet package. The updated zlib version fixes CVEs:
CVE-2016-9840, CVE-2016-9841, CVE-2016-9842, CVE-2016-9843
See #2934 for more information.
librdkafka now uses vcpkg for up-to-date Windows
dependencies in the librdkafka.redist NuGet package:
OpenSSL 1.1.1l, zlib 1.2.11, zstd 1.5.0.
The upstream dependency (OpenSSL, zstd, zlib) source archive checksums are
now verified when building with ./configure --install-deps.
These builds are used by the librdkafka builds bundled with
confluent-kafka-go, confluent-kafka-python and confluent-kafka-dotnet.

Enhancements

Producer flush() now overrides the linger.ms setting for the duration
of the flush() call, effectively triggering immediate transmission of
queued messages. (#3489)

Fixes

General fixes

Correctly detect presence of zlib via compilation check. (Chris Novakovic)
ERR__ALL_BROKERS_DOWN is no longer emitted when the coordinator
connection goes down, only when all standard named brokers have been tried.
This fixes the issue with ERR__ALL_BROKERS_DOWN being triggered on
consumer_close(). It is also now only emitted if the connection was fully
up (past handshake), and not just connected.
rd_kafka_query_watermark_offsets(), rd_kafka_offsets_for_times(),
consumer_lag metric, and auto.offset.reset now honour
isolation.level and will return the Last Stable Offset (LSO)
when isolation.level is set to read_committed (default), rather than
the uncommitted high-watermark when it is set to read_uncommitted. (#3423)
SASL GSSAPI is now usable when sasl.kerberos.min.time.before.relogin
is set to 0 - which disables ticket refreshes (by @mpekalski, #3431).
Rename internal crc32c() symbol to rd_crc32c() to avoid conflict with
other static libraries (#3421).
txidle and rxidle in the statistics object was emitted as 18446744073709551615 when no idle was known. -1 is now emitted instead. (#3519)

Consumer fixes

Automatically retry offset commits on ERR_REQUEST_TIMED_OUT,
ERR_COORDINATOR_NOT_AVAILABLE, and ERR_NOT_COORDINATOR (#3398).
Offset commits will be retried twice.
Timed auto commits did not work when only using assign() and not subscribe().
This regression was introduced in v1.7.0.
If the topics matching the current subscription changed (or the application
updated the subscription) while there was an outstanding JoinGroup or
SyncGroup request, an additional request would sometimes be sent before
handling the response of the first. This in turn lead to internal state
issues that could cause a crash or malbehaviour.
The consumer will now wait for any outstanding JoinGroup or SyncGroup
responses before re-joining the group.
auto.offset.reset could previously be triggered by temporary errors,
such as disconnects and timeouts (after the two retries are exhausted).
This is now fixed so that the auto offset reset policy is only triggered
for permanent errors.
The error that triggers auto.offset.reset is now logged to help the
application owner identify the reason of the reset.
If a rebalance takes longer than a consumer's session.timeout.ms, the
consumer will remain in the group as long as it receives heartbeat responses
from the broker.

Admin fixes

DeleteRecords() could crash if one of the underlying requests
(for a given partition leader) failed at the transport level (e.g., timeout).
(#3476).

Checksums

Release asset checksums:

v1.8.0.zip SHA256 4b173f759ea5fdbc849fdad00d3a836b973f76cbd3aa8333290f0398fd07a1c4
v1.8.0.tar.gz SHA256 93b12f554fa1c8393ce49ab52812a5f63e264d9af6a50fd6e6c318c481838b7f

Contributors

mpekalski

Assets 2

10 May 09:28

edenhill

v1.7.0

77a013b

v1.7.0

librdkafka v1.7.0

librdkafka v1.7.0 is feature release:

KIP-360 - Improve reliability of transactional producer.
Requires Apache Kafka 2.5 or later.
OpenSSL Engine support (ssl.engine.location) by @adinigam and @ajbarb.

Enhancements

Added connections.max.idle.ms to automatically close idle broker
connections.
This feature is disabled by default unless bootstrap.servers contains
the string azure in which case the default is set to <4 minutes to improve
connection reliability and circumvent limitations with the Azure load
balancers (see #3109 for more information).
Bumped to OpenSSL 1.1.1k in binary librdkafka artifacts.
The binary librdkafka artifacts for Alpine are now using Alpine 3.12.
Improved static librdkafka Windows builds using MinGW (@neptoess, #3130).

Upgrade considerations

The C++ oauthbearer_token_refresh_cb() was missing a Handle *
argument that has now been added. This is a breaking change but the original
function signature is considered a bug.
This change only affects C++ OAuth developers.
KIP-735 The consumer session.timeout.ms
default was changed from 10 to 45 seconds to make consumer groups more
robust and less sensitive to temporary network and cluster issues.
Statistics: consumer_lag is now using the committed_offset,
while the new consumer_lag_stored is using stored_offset
(offset to be committed).
This is more correct than the previous consumer_lag which was using
either committed_offset or app_offset (last message passed
to application).

Fixes

General fixes

Fix accesses to freed metadata cache mutexes on client termination (#3279)
There was a race condition on receiving updated metadata where a broker id
update (such as bootstrap to proper broker transformation) could finish after
the topic metadata cache was updated, leading to existing brokers seemingly
being not available.
One occurrence of this issue was query_watermark_offsets() that could return
ERR__UNKNOWN_PARTITION for existing partitions shortly after the
client instance was created.
The OpenSSL context is now initialized with TLS_client_method()
(on OpenSSL >= 1.1.0) instead of the deprecated and outdated
SSLv23_client_method().
The initial cluster connection on client instance creation could sometimes
be delayed up to 1 second if a group.id or transactional.id
was configured (#3305).
Speed up triggering of new broker connections in certain cases by exiting
the broker thread io/op poll loop when a wakeup op is received.
SASL GSSAPI: The Kerberos kinit refresh command was triggered from
rd_kafka_new() which made this call blocking if the refresh command
was taking long. The refresh is now performed by the background rdkafka
main thread.
Fix busy-loop (100% CPU on the broker threads) during the handshake phase
of an SSL connection.
Disconnects during SSL handshake are now propagated as transport errors
rather than SSL errors, since these disconnects are at the transport level
(e.g., incorrect listener, flaky load balancer, etc) and not due to SSL
issues.
Increment metadata fast refresh interval backoff exponentially (@ajbarb, #3237).
Unthrottled requests are no longer counted in the brokers[].throttle
statistics object.
Log CONFWARN warning when global topic configuration properties
are overwritten by explicitly setting a default_topic_conf.

Consumer fixes

If a rebalance happened during a consume_batch..() call the already
accumulated messages for revoked partitions were not purged, which would
pass messages to the application for partitions that were no longer owned
by the consumer. Fixed by @jliunyu. #3340.
Fix balancing and reassignment issues with the cooperative-sticky assignor.
#3306.
Fix incorrect detection of first rebalance in sticky assignor (@hallfox).
Aborted transactions with no messages produced to a partition could
cause further successfully committed messages in the same Fetch response to
be ignored, resulting in consumer-side message loss.
A log message along the lines Abort txn ctrl msg bad order at offset 7501: expected before or at 7702: messages in aborted transactions may be delivered to the application
would be seen.
This is a rare occurrence where a transactional producer would register with
the partition but not produce any messages before aborting the transaction.
The consumer group deemed cached metadata up to date by checking
topic.metadata.refresh.interval.ms: if this property was set too low
it would cause cached metadata to be unusable and new metadata to be fetched,
which could delay the time it took for a rebalance to settle.
It now correctly uses metadata.max.age.ms instead.
The consumer group timed auto commit would attempt commits during rebalances,
which could result in "Illegal generation" errors. This is now fixed, the
timed auto committer is only employed in the steady state when no rebalances
are taking places. Offsets are still auto committed when partitions are
revoked.
Retriable FindCoordinatorRequest errors are no longer propagated to
the application as they are retried automatically.
Fix rare crash (assert rktp_started) on consumer termination
(introduced in v1.6.0).
Fix unaligned access and possibly corrupted snappy decompression when
building with MSVC (@azat)
A consumer configured with the cooperative-sticky assignor did
not actively Leave the group on unsubscribe(). This delayed the
rebalance for the remaining group members by up to session.timeout.ms.
The current subscription list was sometimes leaked when unsubscribing.

Producer fixes

The timeout value of flush() was not respected when delivery reports
were scheduled as events (such as for confluent-kafka-go) rather than
callbacks.
There was a race conditition in purge() which could cause newly
created partition objects, or partitions that were changing leaders, to
not have their message queues purged. This could cause
abort_transaction() to time out. This issue is now fixed.
In certain high-thruput produce rate patterns producing could stall for
1 second, regardless of linger.ms, due to rate-limiting of internal
queue wakeups. This is now fixed by not rate-limiting queue wakeups but
instead limiting them to one wakeup per queue reader poll. #2912.

Transactional Producer fixes

KIP-360: Fatal Idempotent producer errors are now recoverable by the
transactional producer and will raise a txn_requires_abort() error.
If the cluster went down between produce() and commit_transaction()
and before any partitions had been registered with the coordinator, the
messages would time out but the commit would succeed because nothing
had been sent to the coordinator. This is now fixed.
If the current transaction failed while commit_transaction() was
checking the current transaction state an invalid state transaction could
occur which in turn would trigger a assertion crash.
This issue showed up as "Invalid txn state transition: .." crashes, and is
now fixed by properly synchronizing both checking and transition of state.

Assets 2

0 Join discussion

24 Feb 13:39

edenhill

v1.6.1

1a72255

v1.6.1

librdkafka v1.6.1

librdkafka v1.6.1 is a maintenance release.

Upgrade considerations

Fatal idempotent producer errors are now also fatal to the transactional
producer. This is a necessary step to maintain data integrity prior to
librdkafka supporting KIP-360. Applications should check any transactional
API errors for the is_fatal flag and decommission the transactional producer
if the flag is set.
The consumer error raised by auto.offset.reset=error now has error-code
set to ERR__AUTO_OFFSET_RESET to allow an application to differentiate
between auto offset resets and other consumer errors.

Fixes

General fixes

Admin API and transactional send_offsets_to_transaction() coordinator
requests, such as TxnOffsetCommitRequest, could in rare cases be sent
multiple times which could cause a crash.
ssl.ca.location=probe is now enabled by default on Mac OSX since the
librdkafka-bundled OpenSSL might not have the same default CA search paths
as the system or brew installed OpenSSL. Probing scans all known locations.

Transactional Producer fixes

Fatal idempotent producer errors are now also fatal to the transactional
producer.
The transactional producer could crash if the transaction failed while
send_offsets_to_transaction() was called.
Group coordinator requests for transactional
send_offsets_to_transaction() calls would leak memory if the
underlying request was attempted to be sent after the transaction had
failed.
When gradually producing to multiple partitions (resulting in multiple
underlying AddPartitionsToTxnRequests) sub-sequent partitions could get
stuck in pending state under certain conditions. These pending partitions
would not send queued messages to the broker and eventually trigger
message timeouts, failing the current transaction. This is now fixed.
Committing an empty transaction (no messages were produced and no
offsets were sent) would previously raise a fatal error due to invalid state
on the transaction coordinator. We now allow empty/no-op transactions to
be committed.

Consumer fixes

The consumer will now retry indefinitely (or until the assignment is changed)
to retrieve committed offsets. This fixes the issue where only two retries
were attempted when outstanding transactions were blocking OffsetFetch
requests with ERR_UNSTABLE_OFFSET_COMMIT. #3265

Assets 2

26 Jan 16:20

edenhill

v1.6.0

7fe18e4

v1.6.0

librdkafka v1.6.0

librdkafka v1.6.0 is feature release:

KIP-429 Incremental rebalancing with sticky consumer group partition assignor (KIP-54) (by @mhowlett).
KIP-480 Sticky producer partitioning (sticky.partitioning.linger.ms) - achieves higher throughput and lower latency through sticky selection of random partition (by @abbycriswell).
AdminAPI: Add support for DeleteRecords(), DeleteGroups() and DeleteConsumerGroupOffsets() (by @gridaphobe)
KIP-447 Producer scalability for exactly once semantics - allows a single transactional producer to be used for multiple input partitions. Requires Apache Kafka 2.5 or later.
Transactional producer fixes and improvements, see Transactional Producer fixes below.
The librdkafka.redist NuGet package now supports Linux ARM64/Aarch64.

Upgrade considerations

Sticky producer partitioning (sticky.partitioning.linger.ms) is
enabled by default (10 milliseconds) which affects the distribution of
randomly partitioned messages, where previously these messages would be
evenly distributed over the available partitions they are now partitioned
to a single partition for the duration of the sticky time
(10 milliseconds by default) before a new random sticky partition
is selected.
The new KIP-447 transactional producer scalability guarantees are only
supported on Apache Kafka 2.5 or later, on earlier releases you will
need to use one producer per input partition for EOS. This limitation
is not enforced by the producer or broker.
Error handling for the transactional producer has been improved, see
the Transactional Producer fixes below for more information.

Known issues

The Transactional Producer's API timeout handling is inconsistent with the
underlying protocol requests, it is therefore strongly recommended that
applications call rd_kafka_commit_transaction() and
rd_kafka_abort_transaction() with the timeout_ms parameter
set to -1, which will use the remaining transaction timeout.

Enhancements

KIP-107, KIP-204: AdminAPI: Added DeleteRecords() (by @gridaphobe).
KIP-229: AdminAPI: Added DeleteGroups() (by @gridaphobe).
KIP-496: AdminAPI: Added DeleteConsumerGroupOffsets().
KIP-464: AdminAPI: Added support for broker-side default partition count
and replication factor for CreateTopics().
Windows: Added ssl.ca.certificate.stores to specify a list of
Windows Certificate Stores to read CA certificates from, e.g.,
CA,Root. Root remains the default store.
Use reentrant rand_r() on supporting platforms which decreases lock
contention (@azat).
Added assignor debug context for troubleshooting consumer partition
assignments.
Updated to OpenSSL v1.1.1i when building dependencies.
Update bundled lz4 (used when ./configure --disable-lz4-ext) to v1.9.3
which has vast performance improvements.
Added rd_kafka_conf_get_default_topic_conf() to retrieve the
default topic configuration object from a global configuration object.
Added conf debugging context to debug - shows set configuration
properties on client and topic instantiation. Sensitive properties
are redacted.
Added rd_kafka_queue_yield() to cancel a blocking queue call.
Will now log a warning when multiple ClusterIds are seen, which is an
indication that the client might be erroneously configured to connect to
multiple clusters which is not supported.
Added rd_kafka_seek_partitions() to seek multiple partitions to
per-partition specific offsets.

Fixes

General fixes

Fix a use-after-free crash when certain coordinator requests were retried.
The C++ oauthbearer_set_token() function would call free() on
a new-created pointer, possibly leading to crashes or heap corruption (#3194)

Consumer fixes

The consumer assignment and consumer group implementations have been
decoupled, simplified and made more strict and robust. This will sort out
a number of edge cases for the consumer where the behaviour was previously
undefined.
Partition fetch state was not set to STOPPED if OffsetCommit failed.
The session timeout is now enforced locally also when the coordinator
connection is down, which was not previously the case.

Transactional Producer fixes

Transaction commit or abort failures on the broker, such as when the
producer was fenced by a newer instance, were not propagated to the
application resulting in failed commits seeming successful.
This was a critical race condition for applications that had a delay after
producing messages (or sendings offsets) before committing or
aborting the transaction. This issue has now been fixed and test coverage
improved.
The transactional producer API would return RD_KAFKA_RESP_ERR__STATE
when API calls were attempted after the transaction had failed, we now
try to return the error that caused the transaction to fail in the first
place, such as RD_KAFKA_RESP_ERR__FENCED when the producer has
been fenced, or RD_KAFKA_RESP_ERR__TIMED_OUT when the transaction
has timed out.
Transactional producer retry count for transactional control protocol
requests has been increased from 3 to infinite, retriable errors
are now automatically retried by the producer until success or the
transaction timeout is exceeded. This fixes the case where
rd_kafka_send_offsets_to_transaction() would fail the current
transaction into an abortable state when CONCURRENT_TRANSACTIONS was
returned by the broker (which is a transient error) and the 3 retries
were exhausted.

Producer fixes

Calling rd_kafka_topic_new() with a topic config object with
message.timeout.ms set could sometimes adjust the global linger.ms
property (if not explicitly configured) which was not desired, this is now
fixed and the auto adjustment is only done based on the
default_topic_conf at producer creation.
rd_kafka_flush() could previously return RD_KAFKA_RESP_ERR__TIMED_OUT
just as the timeout was reached if the messages had been flushed but
there were now no more messages. This has been fixed.

Checksums

Release asset checksums:

v1.6.0.zip SHA256 af6f301a1c35abb8ad2bb0bab0e8919957be26c03a9a10f833c8f97d6c405aa8
v1.6.0.tar.gz SHA256 3130cbd391ef683dc9acf9f83fe82ff93b8730a1a34d0518e93c250929be9f6b

Assets 2

09 Dec 10:03

edenhill

v1.5.3

6283304

v1.5.3

librdkafka v1.5.3

librdkafka v1.5.3 is a maintenance release.

Upgrade considerations

CentOS 6 is now EOL and is no longer included in binary librdkafka packages,
such as NuGet.

Fixes

General fixes

Fix a use-after-free crash when certain coordinator requests were retried.

Consumer fixes

Consumer would not filter out messages for aborted transactions
if the messages were compressed (#3020).
Consumer destroy without prior close() could hang in certain
cgrp states (@gridaphobe, #3127).
Fix possible null dereference in Message::errstr() (#3140).
The roundrobin partition assignment strategy could get stuck in an
endless loop or generate uneven assignments in case the group members
had asymmetric subscriptions (e.g., c1 subscribes to t1,t2 while c2
subscribes to t2,t3). (#3159)

Checksums

Release asset checksums:

v1.5.3.zip SHA256 3f24271232a42f2d5ac8aab3ab1a5ddbf305f9a1ae223c840d17c221d12fe4c1
v1.5.3.tar.gz SHA256 2105ca01fef5beca10c9f010bc50342b15d5ce6b73b2489b012e6d09a008b7bf

Assets 2

20 Oct 08:33

edenhill

v1.5.2

dbafbb7

v1.5.2

librdkafka v1.5.2

librdkafka v1.5.2 is a maintenance release.

Upgrade considerations

The default value for the producer configuration property retries has
been increased from 2 to infinity, effectively limiting Produce retries to
only message.timeout.ms.
As the reasons for the automatic internal retries vary (various broker error
codes as well as transport layer issues), it doesn't make much sense to limit
the number of retries for retriable errors, but instead only limit the
retries based on the allowed time to produce a message.
The default value for the producer configuration property
request.timeout.ms has been increased from 5 to 30 seconds to match
the Apache Kafka Java producer default.
This change yields increased robustness for broker-side congestion.

Enhancements

The generated CONFIGURATION.md (through rd_kafka_conf_properties_show()))
now include all properties and values, regardless if they were included in
the build, and setting a disabled property or value through
rd_kafka_conf_set() now returns RD_KAFKA_CONF_INVALID and provides
a more useful error string saying why the property can't be set.
Consumer configs on producers and vice versa will now be logged with
warning messages on client instantiation.

Fixes

Security fixes

There was an incorrect call to zlib's inflateGetHeader() with
unitialized memory pointers that could lead to the GZIP header of a fetched
message batch to be copied to arbitrary memory.
This function call has now been completely removed since the result was
not used.
Reported by Ilja van Sprundel.

General fixes

rd_kafka_topic_opaque() (used by the C++ API) would cause object
refcounting issues when used on light-weight (error-only) topic objects
such as consumer errors (#2693).
Handle name resolution failures when formatting IP addresses in error logs,
and increase printed hostname limit to ~256 bytes (was ~60).
Broker sockets would be closed twice (thus leading to potential race
condition with fd-reuse in other threads) if a custom socket_cb would
return error.

Consumer fixes

The roundrobin partition.assignment.strategy could crash (assert)
for certain combinations of members and partitions.
This is a regression in v1.5.0. (#3024)
The C++ KafkaConsumer destructor did not destroy the underlying
C rd_kafka_t instance, causing a leak if close() was not used.
Expose rich error strings for C++ Consumer Message->errstr().
The consumer could get stuck if an outstanding commit failed during
rebalancing (#2933).
Topic authorization errors during fetching are now reported only once (#3072).

Producer fixes

Topic authorization errors are now properly propagated for produced messages,
both through delivery reports and as ERR_TOPIC_AUTHORIZATION_FAILED
return value from produce*() (#2215)
Treat cluster authentication failures as fatal in the transactional
producer (#2994).
The transactional producer code did not properly reference-count partition
objects which could in very rare circumstances lead to a use-after-free bug
if a topic was deleted from the cluster when a transaction was using it.
ERR_KAFKA_STORAGE_ERROR is now correctly treated as a retriable
produce error (#3026).
Messages that timed out locally would not fail the ongoing transaction.
If the application did not take action on failed messages in its delivery
report callback and went on to commit the transaction, the transaction would
be successfully committed, simply omitting the failed messages.
EndTxnRequests (sent on commit/abort) are only retried in allowed
states (#3041).
Previously the transaction could hang on commit_transaction() if an abortable
error was hit and the EndTxnRequest was to be retried.

Note: there was no v1.5.1 librdkafka release

Checksums

Release asset checksums:

v1.5.2.zip SHA256 de70ebdb74c7ef8c913e9a555e6985bcd4b96eb0c8904572f3c578808e0992e1
v1.5.2.tar.gz SHA256 ca3db90d04ef81ca791e55e9eed67e004b547b7adedf11df6c24ac377d4840c6

Assets 2

20 Jul 12:43

edenhill

v1.5.0

39796d3

v1.5.0

librdkafka v1.5.0

The v1.5.0 release brings usability improvements, enhancements and fixes to
librdkafka.

Enhancements

Improved broker connection error reporting with more useful information and
hints on the cause of the problem.
Consumer: Propagate errors when subscribing to unavailable topics (#1540)
Producer: Add batch.size producer configuration property (#638)
Add topic.metadata.propagation.max.ms to allow newly manually created
topics to be propagated throughout the cluster before reporting them
as non-existent. This fixes race issues where CreateTopics() is
quickly followed by produce().
Prefer least idle connection for periodic metadata refreshes, et.al.,
to allow truly idle connections to time out and to avoid load-balancer-killed
idle connection errors (#2845)
Added rd_kafka_event_debug_contexts() to get the debug contexts for
a debug log line (by @wolfchimneyrock).
Added Test scenarios which define the cluster configuration.
Added MinGW-w64 builds (@ed-alertedh, #2553)
./configure --enable-XYZ now requires the XYZ check to pass,
and --disable-XYZ disables the feature altogether (@benesch)
Added rd_kafka_produceva() which takes an array of produce arguments
for situations where the existing rd_kafka_producev() va-arg approach
can't be used.
Added rd_kafka_message_broker_id() to see the broker that a message
was produced or fetched from, or an error was associated with.
Added RTT/delay simulation to mock brokers.

Upgrade considerations

Subscribing to non-existent and unauthorized topics will now propagate
errors RD_KAFKA_RESP_ERR_UNKNOWN_TOPIC_OR_PART and
RD_KAFKA_RESP_ERR_TOPIC_AUTHORIZATION_FAILED to the application through
the standard consumer error (the err field in the message object).
Consumer will no longer trigger auto creation of topics,
allow.auto.create.topics=true may be used to re-enable the old deprecated
functionality.
The default consumer pre-fetch queue threshold queued.max.messages.kbytes
has been decreased from 1GB to 64MB to avoid excessive network usage for low
and medium throughput consumer applications. High throughput consumer
applications may need to manually set this property to a higher value.
The default consumer Fetch wait time has been increased from 100ms to 500ms
to avoid excessive network usage for low throughput topics.
If OpenSSL is linked statically, or ssl.ca.location=probe is configured,
librdkafka will probe known CA certificate paths and automatically use the
first one found. This should alleviate the need to configure
ssl.ca.location when the statically linked OpenSSL's OPENSSLDIR differs
from the system's CA certificate path.
The heuristics for handling Apache Kafka < 0.10 brokers has been removed to
improve connection error handling for modern Kafka versions.
Users on Brokers 0.9.x or older should already be configuring
api.version.request=false and broker.version.fallback=... so there
should be no functional change.
The default producer batch accumulation time, linger.ms, has been changed
from 0.5ms to 5ms to improve batch sizes and throughput while reducing
the per-message protocol overhead.
Applications that require lower produce latency than 5ms will need to
manually set linger.ms to a lower value.
librdkafka's build tooling now requires Python 3.x (python3 interpreter).

Fixes

General fixes

The client could crash in rare circumstances on ApiVersion or
SaslHandshake request timeouts (#2326)
./configure --LDFLAGS='a=b, c=d' with arguments containing = are now
supported (by @sky92zwq).
./configure arguments now take precedence over cached configure variables
from previous invocation.
Fix theoretical crash on coord request failure.
Unknown partition error could be triggered for existing partitions when
additional partitions were added to a topic (@benesch, #2915)
Quickly refresh topic metadata for desired but non-existent partitions.
This will speed up the initial discovery delay when new partitions are added
to an existing topic (#2917).

Consumer fixes

The roundrobin partition assignor could crash if subscriptions
where asymmetrical (different sets from different members of the group).
Thanks to @ankon and @wilmai for identifying the root cause (#2121).
The consumer assignors could ignore some topics if there were more subscribed
topics than consumers in taking part in the assignment.
The consumer would connect to all partition leaders of a topic even
for partitions that were not being consumed (#2826).
Initial consumer group joins should now be a couple of seconds quicker
thanks expedited query intervals (@benesch).
Fix crash and/or inconsistent subscriptions when using multiple consumers
(in the same process) with wildcard topics on Windows.
Don't propagate temporary offset lookup errors to application.
Immediately refresh topic metadata when partitions are reassigned to other
brokers, avoiding a fetch stall of up to topic.metadata.refresh.interval.ms. (#2955)
Memory for batches containing control messages would not be freed when
using the batch consume APIs (@pf-qiu, #2990).

Producer fixes

Proper locking for transaction state in EndTxn handler.

Checksums

Release asset checksums:

v1.5.0.zip SHA256 76a1e83d643405dd1c0e3e62c7872b74e3a96c52be910233e8ec02d501fa33c8
v1.5.0.tar.gz SHA256 f7fee59fdbf1286ec23ef0b35b2dfb41031c8727c90ced6435b8cf576f23a656

Assets 2

Releases: confluentinc/librdkafka

v1.9.0

librdkafka v1.9.0

Upgrade considerations

Enhancements

Fixes

General fixes

Consumer fixes

Transactional producer fixes

Contributors

Uh oh!

v1.6.2

librdkafka v1.6.2

Checksums

Uh oh!

v1.8.2

librdkafka v1.8.2

Enhancements

Fixes

Transactional producer fixes

Checksums

Uh oh!

v1.8.0

librdkafka v1.8.0

Enhancements

Fixes

General fixes

Consumer fixes

Admin fixes

Checksums

Contributors

Uh oh!

v1.7.0

librdkafka v1.7.0

Enhancements

Upgrade considerations

Fixes

General fixes

Consumer fixes

Producer fixes

Transactional Producer fixes

Uh oh!

v1.6.1

librdkafka v1.6.1

Upgrade considerations

Fixes

General fixes

Transactional Producer fixes

Consumer fixes

Uh oh!

v1.6.0

librdkafka v1.6.0

Upgrade considerations

Known issues

Enhancements

Fixes

General fixes

Consumer fixes

Transactional Producer fixes

Producer fixes

Checksums

Uh oh!

v1.5.3

librdkafka v1.5.3

Upgrade considerations

Fixes

General fixes

Consumer fixes

Checksums

Uh oh!

v1.5.2

librdkafka v1.5.2

Upgrade considerations

Enhancements

Fixes

Security fixes

General fixes

Consumer fixes

Producer fixes

Checksums