Skip to content
This repository was archived by the owner on Nov 6, 2020. It is now read-only.
This repository was archived by the owner on Nov 6, 2020. It is now read-only.

Warp Sync: Alternative snapshot formats #8565

@rphmeier

Description

@rphmeier

We want to investigate new snapshot formats which are better for the following properties:

  • Generation of (partial) snapshots on nodes
  • Reusability of chunks between snapshots to make finding them easier
  • Trustworthiness of data

Snapshot chunks are currently divided into two categories:

  • state chunks encode the entire account state of the blockchain at some block
  • security chunks provide corroboration for whether that block is valid without syncing the whole chain

W.r.t. different consensus engines, the "security" chunks will look completely different but will usually contain reusable data. For example, validator-set based consensus systems prove security with a series of bootstrapped handoffs (as long as we relax the weak subjectivity model to assume that old validator sets remain uncorrupted). All finalized handoffs can be reused, although usually their proof is small enough that all the handoffs can fit in a single chunk. Depending on the state churn and snapshot distance we may also be able to reuse some state chunks.

A keystone/delta model where we have intermittent "full" state snapshots every N*K blocks and the snapshots between them (every K blocks) only store deltas over that state is one possibility.

One major problem with the current snapshot system is that it is too heavy for most nodes to produce a snapshot before they prune the state that it encodes from their database. State chunks are currently very tightly packed using a method that makes it impossible to determine which account a chunk starts at or the exact data of the account entry without having produced all the chunks before. One possibility is to design predictable scheme for the boundaries of chunks will allow nodes to produce some of the state chunks but not all.

We can augment this scheme with random sampling: nodes which don't produce full snapshots will randomly sample some accounts and produce account entries for them, which they will keep on disk. They will refuse to propagate any snapshot where their random sample doesn't match the data in the snapshot. Assuming all repropagating nodes have their own random sample and a sufficiently large network, this makes it very unlikely for bad snapshot data to make its way through to unsynced nodes.

cc @ngotchac, @Vurich, @ordian

Metadata

Metadata

Assignees

No one assigned

    Labels

    F7-optimisation 💊An enhancement to provide better overall performance in terms of time-to-completion for a task.M4-core ⛓Core client code / Rust.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions