-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Warp Sync: Alternative snapshot formats #8565
Description
We want to investigate new snapshot formats which are better for the following properties:
- Generation of (partial) snapshots on nodes
- Reusability of chunks between snapshots to make finding them easier
- Trustworthiness of data
Snapshot chunks are currently divided into two categories:
- state chunks encode the entire account state of the blockchain at some block
- security chunks provide corroboration for whether that block is valid without syncing the whole chain
W.r.t. different consensus engines, the "security" chunks will look completely different but will usually contain reusable data. For example, validator-set based consensus systems prove security with a series of bootstrapped handoffs (as long as we relax the weak subjectivity model to assume that old validator sets remain uncorrupted). All finalized handoffs can be reused, although usually their proof is small enough that all the handoffs can fit in a single chunk. Depending on the state churn and snapshot distance we may also be able to reuse some state chunks.
A keystone/delta model where we have intermittent "full" state snapshots every N*K blocks and the snapshots between them (every K blocks) only store deltas over that state is one possibility.
One major problem with the current snapshot system is that it is too heavy for most nodes to produce a snapshot before they prune the state that it encodes from their database. State chunks are currently very tightly packed using a method that makes it impossible to determine which account a chunk starts at or the exact data of the account entry without having produced all the chunks before. One possibility is to design predictable scheme for the boundaries of chunks will allow nodes to produce some of the state chunks but not all.
We can augment this scheme with random sampling: nodes which don't produce full snapshots will randomly sample some accounts and produce account entries for them, which they will keep on disk. They will refuse to propagate any snapshot where their random sample doesn't match the data in the snapshot. Assuming all repropagating nodes have their own random sample and a sufficiently large network, this makes it very unlikely for bad snapshot data to make its way through to unsynced nodes.