-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Race Condition and incorrectly dropped consensus messages #11082
Description
In ethcore/src/client/client.rs the implementation of the queue()
function of IoChannelQueue there is no guarantee that self.currently_queued
is incremented before it is being decremented.
This potentially results in an underflow of self.currently_queued
and incorrectly dropped consensus messages if new consensus messages are being processed on another thread at the same time:
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG sync 0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #0 TRACE sync Received consensus packet from 0
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG poa Ignoring the message, error queueing: The queue is full (18446744073709551615)
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG sync 0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #0 TRACE sync Received consensus packet from 0
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG poa Ignoring the message, error queueing: The queue is full (18446744073709551615)
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG sync 0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #0 TRACE sync Received consensus packet from 0
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG poa Ignoring the message, error queueing: The queue is full (18446744073709551615)
2019-09-19 09:17:54 UTC IO Worker #2 DEBUG sync 0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #2 TRACE sync Received consensus packet from 0
We have encountered that issue quite frequently when deploying Honey Badger validators, leading to the Honey Badger Consensus Engine to become stuck in configurations with low node counts.
Removing the queue size check promptly fixed the issue. One possible fix for the underflow is to increment self.currently_queued
before calling channel.send()
, and decrementing it after channel.send()
returns an error.