Skip to content

Beacon sync maintenance update and header download fix #3269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 8, 2025

Conversation

mjfh
Copy link
Contributor

@mjfh mjfh commented May 8, 2025

No description provided.

mjfh added 8 commits May 8, 2025 16:05
why:
  Must unlink closure that refers to the `hc` descriptor. `stop()` is
  also used in the destructor.
why:
  Might crash repeatedly while debugging.
why:
  Previously, `finalised` was local and static. So it was known that
  the chain `antecedent..finalised` was a segment on the canonical
  chain shared with the header chain cache.

  This was exploited in some fringe cases for `FC` parent detection
  when there was some interference with the FCU calls from the `CL`
  via RPC.

  Now, the `finalised` entry is maintained outside. So the assumption
  that `antecedent..finalised` is on the header chain cache does not
  necessarily hold anymore.
why:
  Previously, the `importBlock()` function was deterministic, i.e.
  non-async. This has changed. Now, this function returns a `Future`
  which can throw an `CancelledError` exception.
…rors

why:
  PR #3204 introduced the concept of generally keeping the last peer
  even if syncing becomes nearly useless. This was an attempt to exploit
  the last peer even if it is labelled `slow`.

  This generality led to sort of syncer looping when fetching from a peer
  that repeatedly causes a disconnect exception so long as the peer is
  fully discarded by the p2p lib.

  This has been changed to keeping a `slow` sync peer if it is the last
  one, but zombify (i.e. discard and keep from reconnecting for a while)
  otherwise.
…ally

why:
  This feature was accidentally removed in PR #3221. It has been re-added
  with a fat explaining comment.

  The problem was that under some circumstances, two peers were able to
  pretend to deterministically (i.e. addressed by hash) fetch different
  header ranges while in reality, they fetched the same range. This led
  the syncer to assume wrongly that all was downloaded while it was not.

also:
  The fetch routine double checks that the block number at least of the
  first header is correct. So, fetching by hash (i.e. deterministically)
  will check that hash and received block number expect expectations.
@mjfh mjfh merged commit 565d868 into master May 8, 2025
5 checks passed
@mjfh mjfh deleted the Beacon-sync-maintenance-update-and-header-download-fix branch May 8, 2025 15:57
@@ -191,31 +186,42 @@ proc delInfo(db: KvtTxRef) =


proc putHeader(db: KvtTxRef; h: Header) =
## Store rlp encoded header
## Store the argument `header` indexed by block number, and the hash lookup
## of the parent header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the number of the parent header and not the current one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to avoid computing a hash, cc: @jangko

mjfh added a commit that referenced this pull request May 13, 2025
why:
  Previously (before PR #3204) the sync peer simply would have been
  zombified and discarded so that there were no sync peers left.

  Then PR #3204 introduced a concept of ignoring any error of the last
  peer via the `infectedByTVirus()` function. This opened a can of worms
  which was mitigated by PR #3269 by only keeping the last sync peer
  non-zombified if it was labelled `slow`.

  The last measure can lead to a heavy syncer slow down while queuing
  blocks if there is only a slow peer available. It will try to fill
  the queue first while it makes more sense to import blocks allowing
  the syncer to collect more sync peers.

  This patch registers the current peer as the last one labelled slow.
  It is up to other functions to exploit that fact.
mjfh added a commit that referenced this pull request May 13, 2025
* Discard peer immediately after `PeerDisconnected` exception

why
  Otherwise it would follow the drill which is to be repeatedly tried
  again unless the maximum number of failures is reached.

* Register last slow sync peer

why:
  Previously (before PR #3204) the sync peer simply would have been
  zombified and discarded so that there were no sync peers left.

  Then PR #3204 introduced a concept of ignoring any error of the last
  peer via the `infectedByTVirus()` function. This opened a can of worms
  which was mitigated by PR #3269 by only keeping the last sync peer
  non-zombified if it was labelled `slow`.

  The last measure can lead to a heavy syncer slow down while queuing
  blocks if there is only a slow peer available. It will try to fill
  the queue first while it makes more sense to import blocks allowing
  the syncer to collect more sync peers.

  This patch registers the current peer as the last one labelled slow.
  It is up to other functions to exploit that fact.

* Also start import while there is only one slow sync peer left.

why:
  See explanation on previous patch.

* Remove stray debugging statement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants