Skip to content

Verified Proxy: Pre-fetch state using eth_createAccessList and add caching to EVM calls #3373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 136 commits into
base: eth-call-vp
Choose a base branch
from

Conversation

bhartnett
Copy link
Contributor

This is an example of how we can add caching to improve the performance of the RPC endpoints which use the EVM in the verified proxy.

I've also implemented an optimization where we call the downstream eth_createAccessList RPC endpoint to pre-fetch the expected account and storage keys and then fetch all the state using eth_getProof (slots for each account are batched together), and then the state is put in the caches before executing the EVM call.

In my testing the pre-fetching provides a reasonable speed up but the verified eth_call is still slower then the unverified eth_call due to the additional network calls required which is to be expected.

Adding support for connecting to the downstream RPC provider via WebSockets would likely improve performance further.

jangko and others added 30 commits May 10, 2025 18:54
If the `baseTxFrame` is not updated, and `updateBase` yield to
async event loop. Other module will access expired `baseTxFrame`.
e.g. `getStatus` of eth/68 will crash the program.
* fix some Nim 2.2 warnings

* copyright year linting

* macOS Sonoma doesn't change oldest support x86 CPU type
* removes `_created` metrics from gauges (they should never have been
there)
* allow labelled metrics to be created from any thread
* Discard peer immediately after `PeerDisconnected` exception

why
  Otherwise it would follow the drill which is to be repeatedly tried
  again unless the maximum number of failures is reached.

* Register last slow sync peer

why:
  Previously (before PR #3204) the sync peer simply would have been
  zombified and discarded so that there were no sync peers left.

  Then PR #3204 introduced a concept of ignoring any error of the last
  peer via the `infectedByTVirus()` function. This opened a can of worms
  which was mitigated by PR #3269 by only keeping the last sync peer
  non-zombified if it was labelled `slow`.

  The last measure can lead to a heavy syncer slow down while queuing
  blocks if there is only a slow peer available. It will try to fill
  the queue first while it makes more sense to import blocks allowing
  the syncer to collect more sync peers.

  This patch registers the current peer as the last one labelled slow.
  It is up to other functions to exploit that fact.

* Also start import while there is only one slow sync peer left.

why:
  See explanation on previous patch.

* Remove stray debugging statement
Add access from History network to historical summaries for the
verification of Capella and onwards block proofs.

Access is provided by adding the BeaconDbCache to the history
network, more specifical to the HeaderVerifier (before called
Accumulators). This approach is taken, over providing callbacks,
 as it is more in sync with how StateNetwork accesses the
HistoryNetwork. It might be still be considered to move to
callbacks in the future though as that could provide a more
"oracle" agnostic way of providing this data.

The BeaconDbCache is created because for Ephemeral headers
verification we will also need access to the Light client updates.
Aside from the Light client updates, the historical summaries
are also added to the cache in its decoded form for easy and
fast access on block verification.

Some changes are likely to be still required to avoid to many
copies of the summaries, TBI.
…sest connected portal client (#3278)

* Refactor state bridge to support sending each content to any of the connected portal clients sorted by distance from the content key.
When looking up a VertexID, the entry might not be present in the
database - this is currently not tracked since the functionality is not
commonly used - with path-based vertex id generation, we'll be making
guesses however where empty lookups become "normal" - the same would
happen for incomplete databases as well.
…3283)

* Remove old wire protocol implementation

* eth68 status isolated

* eth69 preparation

* Fix typo

* Register protocol

* Add BlockRangeUpdate

* Use new receipt format for eth69

* Fix tests

* Update wire protocol setup

* Update syncer addObserver

* Update peer observer

* Handle blockRangeUpdate using peer state

* Add receipt69 roundtrip test

* Replace Receipt69 with StoredReceipt from nim-eth

* Bump nim-eth

* Bump nim-eth to master branch
* rm vestigial EIP-7873 support

* bump nim-eth and nim-web3
* Rename fluffy to portal/nimbus_portal_client

A bunch of renames to remove the fluffy naming and related changes

- fluffy dir -> portal dir
- fluffy binary -> nimbus_portal_client
- portal_bridge binary -> nimbus_portal_bridge
- + renamed related make targets
- Restructure of portal directory for the applications (client +
  bridge)
- Rename of default data dir for nimbus_portal_client and
nimbus_portal_bridge
- Remove most of fluffy naming in code / docs
- Move docker folder one level up
- Move grafana folder into metrics folder

Items that are of importance regarding backwards compatiblity:
- Kept make target for fluffy and portal_bridge
- Data-dir is first check to see if the legacy dir exists, if not
the new dir is created
- ENR file is currently still name fluffy_node.enr

* Move legacy files to new naming + create deprecation file

Also fix nimble tasks

* More fixes/changes

- Change lock file name to portal_node.lock
- Fix debug docker files
- Fix portal testnet script

* Mass replace for fluffy/Fluffy and portal_bridge in docs/comments

* Address feedback regarding usage of binary name
* hoodi chain config: fix shanghai time typo

* Validate built in chain config

* Override compiler side effect analysis
* eth/69: Disconnect peer when receive invalid blockRangeUpdate

* Add trace log when disconnecting peer

* Use debug instead of trace to log blockRangeUpdate
* Code cosmetics, docu/comment and logging updates, etc.

* Explicitly limit header queue length

why
  Beware of outliers (remember law of iterated log.)

also
  No need to reorg the header queue, anymore. This was a pre-PR #3125
  feature which was needed to curb the queue when it grow too large.

  This cannot happen anymore as there is always a deterministic fetch
  that can solve any immediate gap preventing the queue from
  serialising headers.

* Fix issue #3298

reason for crash
  The syncer will stop trying downloading headers after failing on 30
  different sync peers.

  The state machine will advance to `cancelHeaders` causing all sync
  peers to stop as soon as they can without updating the bookkeeping
  for unprocessed headers which might leave the `books` in an open or
  non-finalised state.

  Unfortunately, when synchronising all simultaneously running sync
  peers, the *books* were checked for sort of being finalised already
  before cleaning up (aka finalising.)

* Remove `--debug-beacon-sync-blocks-queue-hwm` command line option

why
  Not needed anymore as the block queue will run on a smaller memory
  footprint, anyway.

* Allow downloading blocks while importing/executing simultaneously

why
  The last PRs merged seem to have made a change, presumably in the
 `FC` module running `async`. This allows for importing/executing
  blocks while fetching new ones at the same without depleting
  sync peers.

  Previously, all sync peers were gone after a while when doing this.

* Move out blocks import section as a separate source module

* Reorg blocks download and import/execute

why
  Blocks download and import is now modelled after how it is done for
  the headers:
    + if a sync peer can import right at the top of the `FC` module,
      download a list of blocks and import right away
    + Otherwise, if a sync peer cannot directly import, then download
      and queue a list of blocks if there is space on the queue
  As a separate pseudo task, fetch a list of blocks from the queue if
  it can be imported right at the top of the `FC` module
* Restructure portal bridge folders.
- Hive has been updated from fluffy to nimbus-portal
- Docker hub repo nimbus-portal-client has been created and latest
build has been added there
- Update grafana dashboard to the latest one used for our fleet
- Update that grafna dashboard to use Nimbus Portal naming
- Remove some left-over fluffy naming
…offers (#3303)

* Remove offer workers and replace using rate limiter for offers.

* Use asyncSpawn for triggerPoke.

* Add content queue workers to history and beacon networks.

* Test random gossip and neighborhood gossip.
* swap block cache for header store; refactor

* format

* review and fixes

* add tests for header store

* remove unused headers

* review and fixes

* fixes

* fix copyright info

* fix copyright year

* check order

* earliest finalized

* make cache len hidden
The current_sync_committee_gindex is fork dependant, this causes
bootstrap validation issue since electra.
jangko and others added 26 commits June 30, 2025 14:43
Each column family in rocksdb requires its own set of SST files that
must be kept open, cached etc. Further, wal files are [deleted]() only
once all column families referencing them have been flushed meaning that
low-volume families like Adm can keep them around far longer than makes
sense.

Adm contains only two useful metadata entries and therefore it doesn't
really make sense to allocate an entire CF for it.

Consolidating Adm into Vtx also makes it easier to reason about the
internal consistency of the Vtx table - even though rocksdb ensures
atomic cross-cf writes via the wal, it requires using a special batch
write API that introduces its own overhead. With this change, don't need
to rely on this particular rocksdb feature to maintain atomic
consistency within Vtx.

Databases using the old schema are supported but rollback is not (ie the
old metadata format/CF is read but not written)
* update nim-eth

* point to master

* fix
* feat: add admin_peers and admin ns (#3431)

* feat: add admin_peers and admin ns

* fix redundant boolean checks and import std sections

* move caps in the main block

* setup admin and quit combined into one call

* fix compile issues

* Add export marker

* Fix tests

* Restore invalid request exeception in admin_addPeer

* Chicken and egg

* oops

* fix: string -> int for ports.discovery and listener (#3438)

* fix: string -> int for ports.discovery and listener

* use int not hex

* fix test

* Add export marker

* Add comments

---------

Co-authored-by: Barnabas Busa <[email protected]>
…ng (#3440)

* Add cli param to enable stateless provider.

* Create execution witness type and implement encoding/decoding.
* Transform FC module internals into DAG

* Optimize updateFinalized

* no changes to chain_private

* More tuning
* add blocks support

* add rpc handlers

* reviews

* format

* catch only error exceptions

* remove unused imports

* review

* add basics tests

* fix
Deferred GC seemed like a good idea to reduce the amount of work done
during block processing, but a side effect of this is that more memory
ends up being allocated in certain workloads which in turn causes an
overall slowdown, with a long test showing a net performance effect that
hovers around 0% and more memory usage.

In particular, the troublesome range around 2M sees a 10-15% slowdown
and an ugly memory usage spike.

Reverting for now - it might be worth revisiting in the future under
different memory allocation patters, but as usual, it's better to not do
work at all (like in #3444) than to do work faster.

This reverts commit 3a00915.
Every time we persist, we collect all changes into a batch and write
that batch to a memtable which rocksdb lazily will write to disk using a
background thread.

The default implementation of the memtable in rocksdb is a skip list
which can handle concurrent writes while still allowing lookups. We're
not using concurrent inserts and the skip list comes with significant
overhead both when writing and when reading.

Here, we switch to a vector memtable which is faster to write but
terrible to read. To compensate, we then proceed to flush the memtable
eagerly to disk which is a blocking operation.

One would think that the blocking of the main thread this would be bad
but it turns out that creating the skip list, also a blocking operation,
is even slower, resulting in a net win.

Coupled with this change, we also make the "lower" levels bigger
effectively reducing the average number of levels that must be looked at
to find recently written data. This could lead to some write
amplicification which is offset by making each file smaller and
therefore making compactions more targeted.

Taken together, this results in an overall import speed boost of about
3-4%, but above all, it reduces the main thread blocking time during
persist.

pre (for 8k blocks persisted around block 11M):
```
DBG 2025-07-03 15:58:14.053+02:00 Core DB persisted
kvtDur=8ms182us947ns mptDur=4s640ms879us492ns endDur=10s50ms862us669ns
stateRoot=none()
```

post:
```
DBG 2025-07-03 14:48:59.426+02:00 Core DB persisted
kvtDur=12ms476us833ns mptDur=4s273ms629us840ns endDur=3s331ms171us989ns
stateRoot=none()
```
When updates to the MPT happen, a new VertexRef is allocated every time - this
keeps the code simple but has the significant downside that updates cause
unnecessary allocations.

Instead of allocating a new `VertexRef` on every update, we can update the
existing one provided that it is not shared. We can prevent it from being shared
by copying it eagerly when it's added to the layer. A downside of this approach
is that we also have to make a copy when invalidating hash keys, which affects
branch and account nodes mainly.

The tradeoff seems well worth it though, specially for imports that clock a nice
perf boost, like in this little test:

```
(21005462, 21008193]  14.46  15.50  2,479.35  2,656.98  9m26s  8m48s   7.16%   7.16%   -6.69%
(21013654, 21016385]  15.28  16.14  2,523.74  2,665.83  8m56s  8m27s   5.63%   5.63%   -5.33%
(21021846, 21024577]  15.52  17.66  2,539.25  2,889.61  8m47s  7m43s  13.80%  13.80%  -12.12%

blocks: 16384, baseline: 27m10s, contender: 24m59s
Time (total): -2m10s, -8.00%
```
fixes eth2 pointing to branch commit instead of unstable
* small bugfixes and cleanups across the board
…runtime (#3448)

* Enable collection of witness keys in ledger at runtime via statelessProviderEnabled flag.
* Simplify FC node coloring

* Optimize updateFinalized
* Schedule orphan block processing to the async worker

* update processQueue

* existedBlock to existingBlock
* Schedule `updateBase` to asynchronous worker.

`updateBase` become synchronous and the scheduler will interleave
`updateBase` with `importBlock` and `forkChoice`.

The scheduler will move the base at fixed size `PersistBatchSize`.

* Remove persistBatchQueue and keep persistBatchSize

* fix tests

* queueUpdateBase tuning

* Fix updateBase scheduler

* Optimize a bit updateBase and queueUpdateBase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants