-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix data races triggered by functional tests. #4247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data races triggered by functional tests. #4247
Conversation
ecb339b
to
0a34a4b
Compare
323dc8d
to
fe7bba9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding and fixing these issues! 👍
Few notes:
- it's better to avoid holding locks for too long to lower the risk of introducing deadlocks at some point (d876781)
- we should always lock
quorumVvecCs
whenever we accessquorumVvec
now that it'sGUARDED_BY(quorumVvecCs)
(099909a) - running
p2p_quorum_data.py
revealed a similar issue on my machine, this time forskShare
(da44228)
3531516
to
a28e1f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-utACK
Needs rebase |
9540a56
to
018a7aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
This pull request has conflicts, please rebase. |
8b82d6a
to
be928d9
Compare
da89dcd
to
7955fb3
Compare
This pull request has conflicts, please rebase. |
4543fab
to
3778e3f
Compare
yes, i finished all fixes now, waiting for the builds. |
3778e3f
to
9047608
Compare
There was any additional data race / deadlock I detected on a run of feature_block.py which I forwarded to @gabriel-bjg My understanding is he's planning to fix that and push |
This pull request has conflicts, please rebase. |
9047608
to
d7501e0
Compare
Regarding the data races, they are all triggered by zmq and libzmq. According to bitcoin tsan exceptions, zmq and libzmq data races are all suppressed (https://github.com/bitcoin/bitcoin/blob/master/test/sanitizer_suppressions/tsan). As agreed with @PastaPastaPasta we will ignore them for now and won't fix anything related to them in this PR. The possible deadlock triggered by tsan is also a false positive. I present below the scenario which triggers the lock order inversion:
Basically what tsan detects is T19 locking: cs_main (first locking), mempool.cs, m_control_mutex, cs_main (second locking as it's already locked, but it's a recursive mutex), cgovernancemanager.cs while T18 is locking: cs_main, mempool.cs. The possible deadlock detection comes from the fact that tsan is not able to detect that cs_main has been already locked before locking mempool.cs on thread19. So it sees thread19 locking mempool.cs, m_control_mutex, cs_main while it sees thread18 locking cs_main and mempool.cs afterwards. |
Function CWallet::KeepKey requires locking as it has concurrent access to database and member nKeysLeftSinceAutoBackup. Avoid data race when reading setInventoryTxToSend size by locking the read. If locking happens after the read, the size may change. Lock cs_mnauth when reading verifiedProRegTxHash. Make fRPCRunning atomic as it can be read/written from different threads simultaneously. Make m_masternode_iqr_connection atomic as it can be read/written from different threads simultaneously. Use a recursive mutex to synchronize concurrent access to quorumVvec. Make m_masternode_connection atomic as it can be read/written from different threads simultaneously. Make m_masternode_probe_connection atomic as it can be read/written from different threads simultaneously. Use a recursive mutex in order to lock access to activeMasterNode. Use a recursive mutex to synchronize concurrent access to skShare. Guarded all mnauth fields of a CNode.
d7501e0
to
b85fc7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK for squash merge
Already fixed here: 41190e9 ("Fix data races triggered by functional tests. (dashpay#4247)", 2021-07-26) 6c10037 rpc: Fix data race (UB) in InterruptRPC() (practicalswift) Pull request description: Fix data race (UB) in `InterruptRPC()`. Before: ``` $ ./configure --with-sanitizers=thread $ make $ test/functional/test_runner.py feature_shutdown.py … SUMMARY: ThreadSanitizer: data race rpc/server.cpp:314 in InterruptRPC() … ALL | ✖ Failed | 2 s (accumulated) ``` After: ``` $ ./configure --with-sanitizers=thread $ make $ test/functional/test_runner.py feature_shutdown.py … ALL | ✓ Passed | 3 s (accumulated) ``` Tree-SHA512: b139ca1a0480258f8caa7730cabd7783a821d906630f51487750a6b15b7842675ed679747e1ff1bdade77d248807e9d77bae7bb88da54d1df84a179cd9b9b987
Function CWallet::KeepKey requires locking as it has concurrent access to database and member nKeysLeftSinceAutoBackup. Avoid data race when reading setInventoryTxToSend size by locking the read. If locking happens after the read, the size may change. Lock cs_mnauth when reading verifiedProRegTxHash. Make fRPCRunning atomic as it can be read/written from different threads simultaneously. Make m_masternode_iqr_connection atomic as it can be read/written from different threads simultaneously. Use a recursive mutex to synchronize concurrent access to quorumVvec. Make m_masternode_connection atomic as it can be read/written from different threads simultaneously. Make m_masternode_probe_connection atomic as it can be read/written from different threads simultaneously. Use a recursive mutex in order to lock access to activeMasterNode. Use a recursive mutex to synchronize concurrent access to skShare. Guarded all mnauth fields of a CNode. Co-authored-by: UdjinM6 <[email protected]>
Thread sanitizer triggers data races on functional tests. Steps to reproduce:
Fixes: