Skip to content

P2P: Feature: Transaction notice #1499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jun 19, 2025
Merged

P2P: Feature: Transaction notice #1499

merged 33 commits into from
Jun 19, 2025

Conversation

heifner
Copy link
Member

@heifner heifner commented May 5, 2025

P2P Transaction propagation enhancement

  • No change for API RPC calls
  • Transaction received over P2P connection
    • Record trx id and connection id and that we have the trx
    • If trx id already received then drop the transaction
    • If trx.size() > 200 bytes
      • send to all connections
        transaction_notice_message { 
           transaction_id_type id; 
        };
      
      • If transaction_notice_message or trx (pending tracked ids) exceeds a maximum size, disconnect.
      • When transaction received, verify trx expiration not more than on-chain configured max.
    • Queue transaction for speculative execution
  • If transaction fails to speculatively execute
    • Drop it
    • Do not immediately remove trx id from recorded transactions
      • Prevents immediately trying it again if received from another connection
  • If transaction succeeds
    • Broadcast to all P2P connections not received trx or transaction-notify from
  • On receive of transaction_notice_message record trx id and connection id but not that we have the transaction

Notes:

A transaction request message is not needed. Transactions are broadcast to nodes that are known not to have the transaction. A node knows a connection does not have a transaction because it has not received the trx or a transaction_notice_message from that connection.

transaction_notice_message is not propagated when received. transaction_notice_message is only sent when a transaction is received. Since transactions are verified (speculatively executed) before being broadcast this will normally provide enough time for transaction_notice_message to be received from nodes that already have the transaction.

Backward compatible with previous versions of net protocol. Will not send transaction_notice_message to any connections not at latest net protocol. Since no transaction_notice_message received from older versions all transactions will be broadcast to them if not received from them.

Resolves #1475

@heifner heifner added the OCI Work exclusive to OCI team label May 5, 2025
@heifner heifner changed the title P2P Feature: Transaction notice P2P: Feature: Transaction notice May 9, 2025
@heifner heifner requested review from greg7mdp and linh2931 May 15, 2025 11:56
Copy link
Contributor

@greg7mdp greg7mdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unordered_flat_set is much nicer :-)

if (auto tptr = id_idx.find( id ); tptr != id_idx.end()) {
if (tptr->connection_ids.insert(c.connection_id).second)
++c.trx_entries_size;
already_have_trx = tptr->have_trx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that maybe we should update tptr->expires if tptr->have_trx == false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each node has a configurable amount of time it will keep trxs entries around. If we update it then we are not really honoring that limit if they keep being sent to us.

Copy link
Contributor

@greg7mdp greg7mdp May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each node has a configurable amount of time it will keep trxs entries around. If we update it then we are not really honoring that limit if they keep being sent to us.

We store this expiration limit per transaction, not per connection. How can we honor the limit that each node sends?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each node does not send us an expiration. Our node has an expiration for how long it will remember a trx. I don't think another coming into the node should reset the expiration.

Copy link
Contributor

@greg7mdp greg7mdp May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have the expiration in the notice then? It is somewhat irrelevant imo. I think it would make sense to have the expiration only when we have the transaction (so maybe make it a std::optional in node_transaction_state, and remove from notice message).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra few bytes for the expiration in the notice I doubt would make little difference but it does allow the receiver of the notice to potentially remove it sooner. Note there can/will be many notices sent where a trx is never sent. If the trx fails to subjectively execute on the node then it will never be sent to its peers. So imagine a node that is being spammed with trxs that fail. That node is going to spam all its peers with trx notice messages.

Copy link
Contributor

@greg7mdp greg7mdp May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra few bytes for the expiration in the notice I doubt would make little difference

yes, I agree

but it does allow the receiver of the notice to potentially remove it sooner

Sooner is a doubtful benefit. If a node is spammed with transactions that fail, why would the spammer provide a short expiration time.
I think maybe we should have a member in node_transaction_state which would be:
timestamp first_notice_received;,
and in expire_txns, we would remove all transactions where
have_trxn == false and for which now() - first_notice_received > max_trx_lifetime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In at least one common case of spam trxs they use a small (or default) expiration. If you are spamming a trx in as to hit some defi condition or win some game. You send many trx that fail. You are not trying to bring down the node, you are just trying to hit a condition on-chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually since there is a potential large delay in the notice and actually receiving the trx, I can see updating the expiration when actually receiving the trx. I'll make that change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto& conn_idx = connections.get<by_connection_id>();
for (auto [conn_id, count] : expired_trxs_for_connection) {
if (auto itr = conn_idx.find( conn_id ); itr != conn_idx.end()) {
itr->c->trx_entries_size -= count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the notices should affect trx_entries_size.
maybe instead of:

mutable bool have_trx = false;

have:

std::optional<uint32_t> trx_connection_source; // if we have received the trx, connection_id which sent it.

actually, because we can receive the same transaction from multiple connections, we probably should have:

mutable connection_id_set connection_srcs; // connections which have sent us this transaction

This is a lot of effort for managing this trx_entries_size count, I wonder if it is worth it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently not a use-case for keeping track of who sent us an actual trx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must misunderstand the purpose of c->trx_entries_size? Why do we need to track this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to prevent someone just spamming the node with transaction_notice_messages until it grinds to a halt.

Copy link
Contributor

@greg7mdp greg7mdp May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have a counter on the connection, which is incremented every time we receive a notice, and reset to 0 every max_trx_lifetime (we can check whenever we receive a notice on a connection whether the counter needs to be reset by keeping a notice_counter_last_reset_time).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we expect a honest node to send us 100,000 notices for transactions that fail within max_trx_lifetime?

Maybe we should have a variant of the notice message to notify that the transaction failed and will never be sent, so we can clean it up immediately from our multiindex (in theory we whould have received it only from this peer, so the connection_ids unordered_flat_map should have only one entry.

Copy link
Contributor

@greg7mdp greg7mdp May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would eliminate the need for the expiration time in the notice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we expect a honest node to send us 100,000 notices for transactions that fail within max_trx_lifetime?

Or succeed. We don't remove them on success until expire either.

The failure notice is interesting. Lets see what @arhag thinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't remove them on success until expire either.

If they succeed, we will receive the transaction and update the expire in the multi-index accordingly, so we wouldn't rely on the expire from the notice anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking some more about this over the weekend. We already have p2p-dedup-cache-expire-time-sec which defaults to 10 seconds. So by default the longest a trx/trx-notice entry is kept is 10 seconds, not 1hr.

Why not just have a counter on the connection, which is incremented every time we receive a notice, and reset to 0 every max_trx_lifetime (we can check whenever we receive a notice on a connection whether the counter needs to be reset by keeping a notice_counter_last_reset_time).

This would be much simpler and faster. We could reset the connection entry counter every p2p-dedup-cache-expire-time-sec. We set the max on the counter to a hard-coded 200,000. By default that allows for 20,000 TPS. A user could increase that by decreasing the p2p-dedup-cache-expire-time-sec.

@linh2931
Copy link
Member

Should transaction_notify be transaction_notice_message, in the 3rd paragraph from the bottom of the PR description ?

@greg7mdp
Copy link
Contributor

@heifner any thoughts about my telegram note:

Another idea which maybe is an alternative to the transaction notice:
For each connection:

  1. every 1/10th of a second or so, send to the peer a vector of transaction ids (just the ids) that this node has validated since the last time, and that we are ready to propagate. Save the time the list was computed (we can use the same list for all peers).
  2. the peer answers with, either a vector of transaction ids that it needs, or a bloom filter of the same to reduce traffic (false positives just mean that we may send a few transactions that our peer already has)
  3. we send to the peer, in a single i/o, a list of all the transactions it has requested.

This would reduce traffic and the number of i/o requests significantly, and, similarly as the notice, avoid sending a node a transaction it already has.

The more I think of it, the more I feel that this would be better than the transaction notice. It reduces the number of i/o requests drastically. Even if we have 2000 trx ids to send every 1/10th of a second (if we are doing 20,000 tps), that still a single message, of size only 64KB, and same for the answer. Also we send all the requested transactions batched together.

It is also simpler than the notice feature I think, which requires updating info in the multiindex for every notice, and consuming memory for every transaction/notice. This new suggestion has very minimal overhead.

heifner added 7 commits May 23, 2025 11:59
…e can't be trusted, just want to use p2p_dedup_cache_expire_time_us. Also since a node uses the minimum of p2p_dedup_cache_expire_time_us and transaction expiration, there is no need to validate expiration in net_plugin (it will be validated by controller).
… of p2p_dedup_cache_expire_time_us instead of number of entries can be added.
Copy link
Member

@linh2931 linh2931 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like use proto_version_t instead of uint16_t.

Should some tests be added for the new message?

@heifner heifner merged commit 446e16c into main Jun 19, 2025
36 checks passed
@heifner heifner deleted the GH-1475-trx-notify branch June 19, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCI Work exclusive to OCI team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

P2P: Add transaction notify
3 participants