Expected Behavior
The idea is to follow an opportunistic approch to manual commits, independant of the Kafka batch size, with a specific parameter governing the maximum number of messages in transit (as does the Reactor Kafka library).
Current Behavior
Currently the asyncAcks mode follows a "commit per batch" strategy, and waits for a batch messages to be fully acknowledged before commiting them and going on with the next batch.
Context
The problem with the current asyncAcks approach is that it couples too much the batching strategy (batch size, etc.) with the commit strategy, and also with the maximum number of messages you want to accept "in transit" (uncommited), which also determines the maximum number of duplicates you may have when a crash occurs.
The proposed approach is closer to what is doing Reactor Kafka (which is now deprecated unfortunately), and is better because you can control your commit sliding window indepently of the Kafka batching strategy. And you never wait between Kafka batches if you remain inside your max sliding window of "in transit" messages, so performances are much better.
This also allows to better control the Kafka polling independently of the duration of the message processing (by pausing the Kafka subscription when necessary and doing a poll to keep the consumer alive), and to manage a hysteresis parameter to avoid "flickering state changes".
This is not that difficult to implement, using for exemple a LinkedHashMap index for each topic partition (to keep message aknowledgement state and maintain ordering, and index messages by Kafka offset), and a background thread that will cleanup the "in transit" messages sequentially and maintain an accumulator with the "most recent commitable offset", and then commiting this offset as soon as possible, possibly with other commit conditions (to allow to plug a custom commit strategy to control commit frequency).
Expected Behavior
The idea is to follow an opportunistic approch to manual commits, independant of the Kafka batch size, with a specific parameter governing the maximum number of messages in transit (as does the Reactor Kafka library).
Current Behavior
Currently the asyncAcks mode follows a "commit per batch" strategy, and waits for a batch messages to be fully acknowledged before commiting them and going on with the next batch.
Context
The problem with the current asyncAcks approach is that it couples too much the batching strategy (batch size, etc.) with the commit strategy, and also with the maximum number of messages you want to accept "in transit" (uncommited), which also determines the maximum number of duplicates you may have when a crash occurs.
The proposed approach is closer to what is doing Reactor Kafka (which is now deprecated unfortunately), and is better because you can control your commit sliding window indepently of the Kafka batching strategy. And you never wait between Kafka batches if you remain inside your max sliding window of "in transit" messages, so performances are much better.
This also allows to better control the Kafka polling independently of the duration of the message processing (by pausing the Kafka subscription when necessary and doing a poll to keep the consumer alive), and to manage a hysteresis parameter to avoid "flickering state changes".
This is not that difficult to implement, using for exemple a LinkedHashMap index for each topic partition (to keep message aknowledgement state and maintain ordering, and index messages by Kafka offset), and a background thread that will cleanup the "in transit" messages sequentially and maintain an accumulator with the "most recent commitable offset", and then commiting this offset as soon as possible, possibly with other commit conditions (to allow to plug a custom commit strategy to control commit frequency).