Skip to content

fix: assert in signing_shares for quorums with 3 members but 2 nodes only #6261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

knst
Copy link
Collaborator

@knst knst commented Sep 10, 2024

Issue being fixed or feature implemented

Currently we have several quorums which have size 3 with threshold 2 nodes: llmq_test_instantsend, llmq_test_platform, llmq_test and they are used on RegTest.

For extreme case when only 2 nodes exist the assert happens:

  AssertionError: Unexpected stderr dashd: llmq/signing_shares.cpp:812: static CDeterministicMNCPtr llmq::CSigSharesManager::SelectMemberForRecovery(const llmq::CQuorumCPtr&, const uint256&, size_t): Assertion `size_t(attempt) < quorum->members.size()' failed.
  Posix Signal: Aborted
     0#: (0x5BF40CE70DA2) stl_vector.h:115       - std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl_data::_M_copy_data(std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl_data const&)
     1#: (0x5BF40CE70DA2) stl_vector.h:127       - std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl_data::_M_swap_data(std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl_data&)
     2#: (0x5BF40CE70DA2) stl_vector.h:1962      - std::vector<unsigned long, std::allocator<unsigned long> >::_M_move_assign(std::vector<unsigned long, std::allocator<unsigned long> >&&, std::integral_constant<bool, true>)
     3#: (0x5BF40CE70DA2) stl_vector.h:771       - std::vector<unsigned long, std::allocator<unsigned long> >::operator=(std::vector<unsigned long, std::allocator<unsigned long> >&&)
     4#: (0x5BF40CE70DA2) stacktraces.cpp:777    - HandlePosixSignal
     5#: (0x7664C9245320) libc_sigaction.c       - ???
     6#: (0x7664C929EB1C) pthread_kill.c:44      - __pthread_kill_implementation
     7#: (0x7664C929EB1C) pthread_kill.c:78      - __pthread_kill_internal
     8#: (0x7664C929EB1C) pthread_kill.c:89      - __GI___pthread_kill
     9#: (0x7664C924526E) raise.c:27             - __GI_raise
    10#: (0x7664C92288FF) abort.c:81             - __GI_abort
    11#: (0x7664C922881B) loadmsgcat.c:1177      - _nl_load_domain
    12#: (0x7664C923B507) <unknown-file>         - ???
    13#: (0x5BF40C6E88C8) signing_shares.cpp:823 - llmq::CSigSharesManager::SelectMemberForRecovery(std::shared_ptr<llmq::CQuorum const> const&, uint256 const&, unsigned long)
    14#: (0x5BF40C94A285) quorums.cpp:737        - operator()
    15#: (0x5BF40C94A514) std_function.h:292     - _M_invoke
    16#: (0x5BF40CE082C6) util.cpp:510           - RPCHelpMan::HandleRequest(JSONRPCRequest const&) const
    17#: (0x5BF40C89824A) univalue.h:17          - UniValue::operator=(UniValue&&)
    18#: (0x5BF40C89824A) server.h:108           - CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const
    19#: (0x5BF40C9976F4) std_function.h:591     - std::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(JSONRPCRequest const&, UniValue&, bool) const
    20#: (0x5BF40C9976F4) server.cpp:622         - ExecuteCommand
    21#: (0x5BF40C99879F) server.cpp:511         - ExecuteCommands
    22#: (0x5BF40C99879F) server.cpp:543         - CRPCTable::execute(JSONRPCRequest const&) const
    23#: (0x5BF40CB75F24) httprpc.cpp:247        - HTTPReq_JSONRPC

Discovered during implementation of https://github.com/dashpay/dash-issues/issues/77

What was done?

Changed condition in assert, implemented special case of using Nth element from array size N for SelectMemberForRecovery, added test for this case.

How Has This Been Tested?

Improved functional test feature_asset_locks.py to test this corner case for quorum llmq_test_instantsend and llmq_test_platform

Breaking Changes

N/A

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone

@knst knst force-pushed the fix-signing-share-assert branch 6 times, most recently from fc8f802 to 7690e8c Compare September 17, 2024 11:45
@knst knst changed the title fix: assert in signing_shares - amount of members can match with amount of attemps fix: assert in signing_shares for quorums with 3 members but 2 nodes only Sep 17, 2024
@knst knst marked this pull request as ready for review September 17, 2024 12:43
@@ -809,7 +809,7 @@ void CSigSharesManager::TryRecoverSig(const CQuorumCPtr& quorum, const uint256&

CDeterministicMNCPtr CSigSharesManager::SelectMemberForRecovery(const CQuorumCPtr& quorum, const uint256 &id, size_t attempt)
{
assert(size_t(attempt) < quorum->members.size());
assert(size_t(attempt) <= quorum->members.size());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. However if we are going to allow attempt to wrap around then quorum->members.size() is actually irrelevant here imo, attempt should be limited by recoveryMembers instead.

From src/llmq/params.h:

    // How many members should we try to send all sigShares to before we give up.
    int recoveryMembers;

Pls consider 8ff9ba1.

@knst knst force-pushed the fix-signing-share-assert branch from 7690e8c to f44edde Compare September 18, 2024 11:58
@knst knst requested a review from UdjinM6 September 18, 2024 12:17
Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK f44edde

@@ -807,9 +807,9 @@ void CSigSharesManager::TryRecoverSig(const CQuorumCPtr& quorum, const uint256&
sigman.ProcessRecoveredSig(rs);
}

CDeterministicMNCPtr CSigSharesManager::SelectMemberForRecovery(const CQuorumCPtr& quorum, const uint256 &id, size_t attempt)
CDeterministicMNCPtr CSigSharesManager::SelectMemberForRecovery(const CQuorumCPtr& quorum, const uint256 &id, int attempt)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: clang format complains

Suggested change
CDeterministicMNCPtr CSigSharesManager::SelectMemberForRecovery(const CQuorumCPtr& quorum, const uint256 &id, int attempt)
CDeterministicMNCPtr CSigSharesManager::SelectMemberForRecovery(const CQuorumCPtr& quorum, const uint256& id, int attempt)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it if need to force-push again.

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK f44edde

@UdjinM6 UdjinM6 added this to the 21.2 milestone Sep 22, 2024
@PastaPastaPasta PastaPastaPasta merged commit 2b92b3e into dashpay:develop Sep 23, 2024
32 of 37 checks passed
@UdjinM6 UdjinM6 modified the milestones: 21.2, 22 Oct 29, 2024
@knst knst deleted the fix-signing-share-assert branch November 11, 2024 07:34
@knst knst mentioned this pull request Nov 28, 2024
5 tasks
PastaPastaPasta added a commit that referenced this pull request Apr 29, 2025
d33ca99 refactor: use `quorum->params.size` instead function argument (Konstantin Akimov)
b3aeaa4 feat: let llmq_test_platform be single-node too (Konstantin Akimov)
702d147 feat: new single node quorum and functional test feature_llmq_singlenode.py (Konstantin Akimov)
67eb2e4 fix: activate dip0008 in feature_dip4_coinbasemerkleroots without extra blocks (Konstantin Akimov)
0cd82fe chore: add TODO in case of testnet3 reset, to deduplicate dkgsession code, remove TODO to create evo nodes without IS (Konstantin Akimov)

Pull request description:

  ## Issue being fixed or feature implemented
  Development for platform requires currently to have running 4 Dash Core instances: wallet and at least 3 Evo Nodes which forms quorum.
  This issue has been partially addressed #6261 by allowing to use only 2 Evo Nodes. Though, previous solution still requires all DKG steps, heavy network communication and various instability.

  ## What was done?
  Added support for RegTest quorums `llmq_test` and `llmq_test_instantsend` to be form and sign messages even if only one node is available.
  While regular quorum with several nodes 7 steps of creation (Initialization, Contribution, Complaining, Justification, Commitment, Finalization, Mining) this type of quorum just from Initialization straight to the Finalization.
  The signing process with this quorum is also simplified. So far as regular quorum requires to sign multiple shares, for single node quorum the messages are signed immediately by just one node.

  Though, single-node-quorum doesn't generate new pair of private and public keys, instead it just uses operator private and public key. Not sure if there's any possible down-sides.

  It is an alternate solution to #6437 which introduces brand-new quorum type.
  Advantages of this solution is:
   - no breaking changes
   - no new quorums introduced
   - these quorums with one node can be both types "evo only" (as platform quorum) and regular "evo and regular" nodes
   - it's possible to sign EHF signals too

  ## How Has This Been Tested?
  See a new functional test feature_llmq_singlenode.py

  To use this feature Dash Core should be started with these 3 params: `-llmqtestparams=1:1 -llmqtestinstantsendparams=1:1 -lmqtestplatformparams=1:1`. It let to InstantSend, Chainlocks to be formed, EHF signals appears, messages to be signed, etc with just one masternode or evo node.

  ## Breaking Changes
  N/A

  ## Checklist:
  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [x] I have added or updated relevant unit/integration/functional/e2e tests
  - [x] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone

ACKs for top commit:
  UdjinM6:
    utACK d33ca99
  PastaPastaPasta:
    utACK d33ca99

Tree-SHA512: 1620fd71ac691e2bd7e6f0600a5fc123fd01a6986507fd7cfc2354608a3ae8b663221f6459dddede8fc95b0c38ae53a09f3878509d5a7dc2d4398333d510f735
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants