Torrent-style fetching for PoVs: high-level

Our current Proof-of-Validity (PoV) distribution system for backing is quite simple: A validator fetches a full PoV from a collator, the the validator announces the PoV, and other validators in the group supply it in full.

## PoV hashes and trie commitments

It is possible to change the semantics of the `pov_hash` in the candidate descriptor to refer to a merkle trie of the PoV data as opposed to the entire data. It needs to be done in a backwards compatible way, so we still support full PoVs. What we would allow is something like this:

```rust
const CHUNK_SIZE = 32KiB;
struct PoVTrieCommitment(root_hash, n_chunks);
```

The trie is computed as a mapping from indices i to the chunk hash.

This is the root hash of the merkle trie as well as the number of chunks contained in the trie. The rightmost chunk may have size less than 32KiB, so the upper bound on the size is predicted by `n_chunks * CHUNK_SIZE`. Storing the number of chunks is important, so validators can tell immediately from the trie root commitment whether the PoV is oversized.

For the `pov_hash` in the candidate descriptor, it is defined to _either_ be `hash(PoV)` or `hash(PoVTrieCommitment)`. This allows for backwards compatibility. Note that the erasure root in the candidate receipt is computed over an erasure-coding on the full PoV data, and will not change. However, it will become necessary for approval and dispute checkers to check that the `pov_hash` in the descriptor matches the trie commitment computed from the full PoV data. Backing checkers, having recovered the PoV from the trie, do not need to do this check. 

As the old format of `pov_hash = hash(pov)` is still supported, validators checking the validity of a descriptor will have to check both `pov_hash == hash(pov) || pov_hash == trie_commitment(pov)`. However, computing the commitment can be done in a single pass over the PoV data without any allocations, just by hashing the stream and ending every 32KiB, so it should be relatively cheap - instead of taking 1 hash pass over the PoV, we take 2, and this is not a bottleneck for approval checkers.

## Pre-validation functions (PreVF)

Authenticating a collator's submitted block without downloading the entire PoV is impossible right now. To address this we use the notion of a Pre-validation function, or PreVF for short. PreVFs also have their own counterpart to the PoV - the Proof-of-Pre-Validity, or PrePoV for short. Definitionally, PoVs and PrePoVs are both just byte vectors, however they have different semantic meanings and can have different semantic length limits imposed. For example, while PoVs may be allowed to span multiple Megabytes, a PrePoV may be limited to just a few kilobytes.

This is a new function on the Parachain Validation code which has this approximate signature

```rust
fn pre_validate(
    parachain_parent_head_data,
    new_para_head_hash,
    relay_parent_number,
    relay_parent_state_root,
    PrePoV,
) -> Result<CollatorId, Bad>
```

We introduce the notion of *reasonable uniqueness*. For a PreVF to function effectively, its result must be reasonably unique. Signatures by a specific validator are reasonably unique so long as equivocations are punished ecnomically. Proofs of Work are reasonably unique if they are sufficiently difficult. And so on. It's allowed to occasionally have collisions in the PreVF, but the multiplicity should be low even if so.

With PreVFs, collators can send over `(CandidateReceipt, PrePoV)` pairs. The PrePoV can be used by validators to determine preliminary validity.

Note that we don't rely on PreVFs for security of any kind. A Parachain exposing a PreVF which is not plausibly unique only detracts from the liveness of the parachain, as it can only be used to force validators to waste bandwidth and fail to back any candidate for the parachain.

## Torrenting PoVs

We introduce a new request type:

```rust
struct Request { pov_hash, index };
struct Response { merkle_proof, chunk_data };
```

We introduce a new gossip statement, to be used exclusively among backing validators:

```rust
IntentToRecoverSeconded(CandidateHash, PovTrieCommitment);
IntentToRecover(CandidateReceipt, PovTrieCommitment, PrePoV);
```

**Stage 1: Torrenting among validator groups**

Stage 1 can be implemented before PreVFs. At this point, a seconding validator still needs to download the full PoV from a collator and check it before seconding. However, after issuing the `Seconded` message, the validator can issue an `IntentToRecover(seconded, trie_commitment)` and the other validators in the group can also signal an intent to recover the data and will recover it from the seconding validator as well as each other. 

We have a few pieces of information that can be used to determine the chunk range that validators initially download, to maximize initial coverage:
  1. Each validator's index in the backing group
  2. The amount of validators in the backing group
  3. The overall size of the PoV 
  4. The fact that a majority opinion is needed within the backing group

Validators can choose the initial ranges of chunks they download based on this information, to achieve a desirable balance of redundancy and availability.

**Stage 2: Torrenting before seconding**

PreVFs will enable validators to recover and torrent a PoV among all validators in the group as well as potentially from the collator even before checking the candidate. There should be a limit of 2 or 3 recoveries at a time, and due to the condition of reasonable uniqueness we expect on PreVFs, this should not affect throughput.

A validator, having checked a PreVF, can distribute the PreVF to other validators in the group, and all begin torrenting chunks from each other as well as the initial collator. The same basic principles for choosing chunk ranges to fetch apply, with the addition of potential rules for collators to follow about which chunks to prioritize serving which validators.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torrent-style fetching for PoVs: high-level #968

PoV hashes and trie commitments

Pre-validation functions (PreVF)

Torrenting PoVs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Torrent-style fetching for PoVs: high-level #968

Description

PoV hashes and trie commitments

Pre-validation functions (PreVF)

Torrenting PoVs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions