-
Notifications
You must be signed in to change notification settings - Fork 134
Verified Proxy: Pre-fetch state using eth_createAccessList and add caching to EVM calls #3373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
bhartnett
wants to merge
136
commits into
eth-call-vp
Choose a base branch
from
eth-call-vp-cache-state-lookups
base: eth-call-vp
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If the `baseTxFrame` is not updated, and `updateBase` yield to async event loop. Other module will access expired `baseTxFrame`. e.g. `getStatus` of eth/68 will crash the program.
* fix some Nim 2.2 warnings * copyright year linting * macOS Sonoma doesn't change oldest support x86 CPU type
* removes `_created` metrics from gauges (they should never have been there) * allow labelled metrics to be created from any thread
* Discard peer immediately after `PeerDisconnected` exception why Otherwise it would follow the drill which is to be repeatedly tried again unless the maximum number of failures is reached. * Register last slow sync peer why: Previously (before PR #3204) the sync peer simply would have been zombified and discarded so that there were no sync peers left. Then PR #3204 introduced a concept of ignoring any error of the last peer via the `infectedByTVirus()` function. This opened a can of worms which was mitigated by PR #3269 by only keeping the last sync peer non-zombified if it was labelled `slow`. The last measure can lead to a heavy syncer slow down while queuing blocks if there is only a slow peer available. It will try to fill the queue first while it makes more sense to import blocks allowing the syncer to collect more sync peers. This patch registers the current peer as the last one labelled slow. It is up to other functions to exploit that fact. * Also start import while there is only one slow sync peer left. why: See explanation on previous patch. * Remove stray debugging statement
Add access from History network to historical summaries for the verification of Capella and onwards block proofs. Access is provided by adding the BeaconDbCache to the history network, more specifical to the HeaderVerifier (before called Accumulators). This approach is taken, over providing callbacks, as it is more in sync with how StateNetwork accesses the HistoryNetwork. It might be still be considered to move to callbacks in the future though as that could provide a more "oracle" agnostic way of providing this data. The BeaconDbCache is created because for Ephemeral headers verification we will also need access to the Light client updates. Aside from the Light client updates, the historical summaries are also added to the cache in its decoded form for easy and fast access on block verification. Some changes are likely to be still required to avoid to many copies of the summaries, TBI.
* Bump nim-eth to latest.
…c node in handleFindContent (#3273)
…sest connected portal client (#3278) * Refactor state bridge to support sending each content to any of the connected portal clients sorted by distance from the content key.
When looking up a VertexID, the entry might not be present in the database - this is currently not tracked since the functionality is not commonly used - with path-based vertex id generation, we'll be making guesses however where empty lookups become "normal" - the same would happen for incomplete databases as well.
…3283) * Remove old wire protocol implementation * eth68 status isolated * eth69 preparation * Fix typo * Register protocol * Add BlockRangeUpdate * Use new receipt format for eth69 * Fix tests * Update wire protocol setup * Update syncer addObserver * Update peer observer * Handle blockRangeUpdate using peer state * Add receipt69 roundtrip test * Replace Receipt69 with StoredReceipt from nim-eth * Bump nim-eth * Bump nim-eth to master branch
* rm vestigial EIP-7873 support * bump nim-eth and nim-web3
* Rename fluffy to portal/nimbus_portal_client A bunch of renames to remove the fluffy naming and related changes - fluffy dir -> portal dir - fluffy binary -> nimbus_portal_client - portal_bridge binary -> nimbus_portal_bridge - + renamed related make targets - Restructure of portal directory for the applications (client + bridge) - Rename of default data dir for nimbus_portal_client and nimbus_portal_bridge - Remove most of fluffy naming in code / docs - Move docker folder one level up - Move grafana folder into metrics folder Items that are of importance regarding backwards compatiblity: - Kept make target for fluffy and portal_bridge - Data-dir is first check to see if the legacy dir exists, if not the new dir is created - ENR file is currently still name fluffy_node.enr * Move legacy files to new naming + create deprecation file Also fix nimble tasks * More fixes/changes - Change lock file name to portal_node.lock - Fix debug docker files - Fix portal testnet script * Mass replace for fluffy/Fluffy and portal_bridge in docs/comments * Address feedback regarding usage of binary name
* hoodi chain config: fix shanghai time typo * Validate built in chain config * Override compiler side effect analysis
* eth/69: Disconnect peer when receive invalid blockRangeUpdate * Add trace log when disconnecting peer * Use debug instead of trace to log blockRangeUpdate
* Code cosmetics, docu/comment and logging updates, etc. * Explicitly limit header queue length why Beware of outliers (remember law of iterated log.) also No need to reorg the header queue, anymore. This was a pre-PR #3125 feature which was needed to curb the queue when it grow too large. This cannot happen anymore as there is always a deterministic fetch that can solve any immediate gap preventing the queue from serialising headers. * Fix issue #3298 reason for crash The syncer will stop trying downloading headers after failing on 30 different sync peers. The state machine will advance to `cancelHeaders` causing all sync peers to stop as soon as they can without updating the bookkeeping for unprocessed headers which might leave the `books` in an open or non-finalised state. Unfortunately, when synchronising all simultaneously running sync peers, the *books* were checked for sort of being finalised already before cleaning up (aka finalising.) * Remove `--debug-beacon-sync-blocks-queue-hwm` command line option why Not needed anymore as the block queue will run on a smaller memory footprint, anyway. * Allow downloading blocks while importing/executing simultaneously why The last PRs merged seem to have made a change, presumably in the `FC` module running `async`. This allows for importing/executing blocks while fetching new ones at the same without depleting sync peers. Previously, all sync peers were gone after a while when doing this. * Move out blocks import section as a separate source module * Reorg blocks download and import/execute why Blocks download and import is now modelled after how it is done for the headers: + if a sync peer can import right at the top of the `FC` module, download a list of blocks and import right away + Otherwise, if a sync peer cannot directly import, then download and queue a list of blocks if there is space on the queue As a separate pseudo task, fetch a list of blocks from the queue if it can be imported right at the top of the `FC` module
* Restructure portal bridge folders.
- Hive has been updated from fluffy to nimbus-portal - Docker hub repo nimbus-portal-client has been created and latest build has been added there
- Update grafana dashboard to the latest one used for our fleet - Update that grafna dashboard to use Nimbus Portal naming - Remove some left-over fluffy naming
…offers (#3303) * Remove offer workers and replace using rate limiter for offers. * Use asyncSpawn for triggerPoke. * Add content queue workers to history and beacon networks. * Test random gossip and neighborhood gossip.
* swap block cache for header store; refactor * format * review and fixes * add tests for header store * remove unused headers * review and fixes * fixes * fix copyright info * fix copyright year * check order * earliest finalized * make cache len hidden
The current_sync_committee_gindex is fork dependant, this causes bootstrap validation issue since electra.
Each column family in rocksdb requires its own set of SST files that must be kept open, cached etc. Further, wal files are [deleted]() only once all column families referencing them have been flushed meaning that low-volume families like Adm can keep them around far longer than makes sense. Adm contains only two useful metadata entries and therefore it doesn't really make sense to allocate an entire CF for it. Consolidating Adm into Vtx also makes it easier to reason about the internal consistency of the Vtx table - even though rocksdb ensures atomic cross-cf writes via the wal, it requires using a special batch write API that introduces its own overhead. With this change, don't need to rely on this particular rocksdb feature to maintain atomic consistency within Vtx. Databases using the old schema are supported but rollback is not (ie the old metadata format/CF is read but not written)
* update nim-eth * point to master * fix
* feat: add admin_peers and admin ns (#3431) * feat: add admin_peers and admin ns * fix redundant boolean checks and import std sections * move caps in the main block * setup admin and quit combined into one call * fix compile issues * Add export marker * Fix tests * Restore invalid request exeception in admin_addPeer * Chicken and egg * oops * fix: string -> int for ports.discovery and listener (#3438) * fix: string -> int for ports.discovery and listener * use int not hex * fix test * Add export marker * Add comments --------- Co-authored-by: Barnabas Busa <[email protected]>
…ng (#3440) * Add cli param to enable stateless provider. * Create execution witness type and implement encoding/decoding.
* Transform FC module internals into DAG * Optimize updateFinalized * no changes to chain_private * More tuning
* add blocks support * add rpc handlers * reviews * format * catch only error exceptions * remove unused imports * review * add basics tests * fix
Deferred GC seemed like a good idea to reduce the amount of work done during block processing, but a side effect of this is that more memory ends up being allocated in certain workloads which in turn causes an overall slowdown, with a long test showing a net performance effect that hovers around 0% and more memory usage. In particular, the troublesome range around 2M sees a 10-15% slowdown and an ugly memory usage spike. Reverting for now - it might be worth revisiting in the future under different memory allocation patters, but as usual, it's better to not do work at all (like in #3444) than to do work faster. This reverts commit 3a00915.
Every time we persist, we collect all changes into a batch and write that batch to a memtable which rocksdb lazily will write to disk using a background thread. The default implementation of the memtable in rocksdb is a skip list which can handle concurrent writes while still allowing lookups. We're not using concurrent inserts and the skip list comes with significant overhead both when writing and when reading. Here, we switch to a vector memtable which is faster to write but terrible to read. To compensate, we then proceed to flush the memtable eagerly to disk which is a blocking operation. One would think that the blocking of the main thread this would be bad but it turns out that creating the skip list, also a blocking operation, is even slower, resulting in a net win. Coupled with this change, we also make the "lower" levels bigger effectively reducing the average number of levels that must be looked at to find recently written data. This could lead to some write amplicification which is offset by making each file smaller and therefore making compactions more targeted. Taken together, this results in an overall import speed boost of about 3-4%, but above all, it reduces the main thread blocking time during persist. pre (for 8k blocks persisted around block 11M): ``` DBG 2025-07-03 15:58:14.053+02:00 Core DB persisted kvtDur=8ms182us947ns mptDur=4s640ms879us492ns endDur=10s50ms862us669ns stateRoot=none() ``` post: ``` DBG 2025-07-03 14:48:59.426+02:00 Core DB persisted kvtDur=12ms476us833ns mptDur=4s273ms629us840ns endDur=3s331ms171us989ns stateRoot=none() ```
When updates to the MPT happen, a new VertexRef is allocated every time - this keeps the code simple but has the significant downside that updates cause unnecessary allocations. Instead of allocating a new `VertexRef` on every update, we can update the existing one provided that it is not shared. We can prevent it from being shared by copying it eagerly when it's added to the layer. A downside of this approach is that we also have to make a copy when invalidating hash keys, which affects branch and account nodes mainly. The tradeoff seems well worth it though, specially for imports that clock a nice perf boost, like in this little test: ``` (21005462, 21008193] 14.46 15.50 2,479.35 2,656.98 9m26s 8m48s 7.16% 7.16% -6.69% (21013654, 21016385] 15.28 16.14 2,523.74 2,665.83 8m56s 8m27s 5.63% 5.63% -5.33% (21021846, 21024577] 15.52 17.66 2,539.25 2,889.61 8m47s 7m43s 13.80% 13.80% -12.12% blocks: 16384, baseline: 27m10s, contender: 24m59s Time (total): -2m10s, -8.00% ```
fixes eth2 pointing to branch commit instead of unstable
* small bugfixes and cleanups across the board
…runtime (#3448) * Enable collection of witness keys in ledger at runtime via statelessProviderEnabled flag.
* Simplify FC node coloring * Optimize updateFinalized
* Schedule orphan block processing to the async worker * update processQueue * existedBlock to existingBlock
* Schedule `updateBase` to asynchronous worker. `updateBase` become synchronous and the scheduler will interleave `updateBase` with `importBlock` and `forkChoice`. The scheduler will move the base at fixed size `PersistBatchSize`. * Remove persistBatchQueue and keep persistBatchSize * fix tests * queueUpdateBase tuning * Fix updateBase scheduler * Optimize a bit updateBase and queueUpdateBase
* eth-call * format * fix
2585bad
to
d2cf1ba
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an example of how we can add caching to improve the performance of the RPC endpoints which use the EVM in the verified proxy.
I've also implemented an optimization where we call the downstream
eth_createAccessList
RPC endpoint to pre-fetch the expected account and storage keys and then fetch all the state usingeth_getProof
(slots for each account are batched together), and then the state is put in the caches before executing the EVM call.In my testing the pre-fetching provides a reasonable speed up but the verified eth_call is still slower then the unverified eth_call due to the additional network calls required which is to be expected.
Adding support for connecting to the downstream RPC provider via WebSockets would likely improve performance further.