It’s a general belief that Kafka guarantees the ordering within a partition. Same is claimed in official Kafka documentation. Below are the excerpts from Kafka documentation.
Topics are partitioned, meaning a topic is spread over a number of “buckets” located on different Kafka brokers. This distributed placement of your data is very important for scalability because it allows client applications to both read and write the data from/to many brokers at the same time. When a new event is published to a topic, it is actually appended to one of the topic’s partitions. Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition, and Kafka guarantees that any consumer of a given topic-partition will always read that partition’s events in exactly the same order as they were written.
Figure: This example topic has four partitions P1–P4. Two different producer clients are publishing, independently from each other, new events to the topic by writing events over the network to the topic’s partitions. Events with the same key (denoted by their color in the figure) are written to the same partition. Note that both producers can write to the same partition if appropriate.
So, where is the problem?
If you run the Kafka with default configurations, there can be a scenario when a message, although produced earlier yet appended after the message produced later.
No, I don’t want to break your heart, but its the bitter truth. Let me unravel this mystery.
Default value for producer’s retries is Integer.MAX_VALUE, and the delivery timeout is 2 minutes. If a message is not acknowledged, producer will keep sending it again and again, unless it succeeds, retries are exhausted or timeout period expires.
There is another configuration “max.in.flight.requests.per.connection”. It is the maximum number of unacknowledged requests, the client will send on a single connection before blocking. Default value of this property is 5.
So if a message is unacknowledged, it will be retried. If messages sent after the failed message are acknowledged before this message succeeds in retry, then those messages will appear before in order i.e. reordering of the messages (et tu Kafka……).
Don’t get disheartened, although by default Kafka doesn’t preserve the order, yet a small tweak in the configurations would restore the guaranteed ordering and your faith in humanity.
You need to set either of following two configurations to ensure the ordering.
- max.in.flight.requests.per.connection =1
Sigh!! After all Kafka is a loyal friend, Happy Messaging!!