Description
Our current Proof-of-Validity (PoV) distribution system for backing is quite simple: A validator fetches a full PoV from a collator, the the validator announces the PoV, and other validators in the group supply it in full.
PoV hashes and trie commitments
It is possible to change the semantics of the pov_hash
in the candidate descriptor to refer to a merkle trie of the PoV data as opposed to the entire data. It needs to be done in a backwards compatible way, so we still support full PoVs. What we would allow is something like this:
const CHUNK_SIZE = 32KiB;
struct PoVTrieCommitment(root_hash, n_chunks);
The trie is computed as a mapping from indices i to the chunk hash.
This is the root hash of the merkle trie as well as the number of chunks contained in the trie. The rightmost chunk may have size less than 32KiB, so the upper bound on the size is predicted by n_chunks * CHUNK_SIZE
. Storing the number of chunks is important, so validators can tell immediately from the trie root commitment whether the PoV is oversized.
For the pov_hash
in the candidate descriptor, it is defined to either be hash(PoV)
or hash(PoVTrieCommitment)
. This allows for backwards compatibility. Note that the erasure root in the candidate receipt is computed over an erasure-coding on the full PoV data, and will not change. However, it will become necessary for approval and dispute checkers to check that the pov_hash
in the descriptor matches the trie commitment computed from the full PoV data. Backing checkers, having recovered the PoV from the trie, do not need to do this check.
As the old format of pov_hash = hash(pov)
is still supported, validators checking the validity of a descriptor will have to check both pov_hash == hash(pov) || pov_hash == trie_commitment(pov)
. However, computing the commitment can be done in a single pass over the PoV data without any allocations, just by hashing the stream and ending every 32KiB, so it should be relatively cheap - instead of taking 1 hash pass over the PoV, we take 2, and this is not a bottleneck for approval checkers.
Pre-validation functions (PreVF)
Authenticating a collator's submitted block without downloading the entire PoV is impossible right now. To address this we use the notion of a Pre-validation function, or PreVF for short. PreVFs also have their own counterpart to the PoV - the Proof-of-Pre-Validity, or PrePoV for short. Definitionally, PoVs and PrePoVs are both just byte vectors, however they have different semantic meanings and can have different semantic length limits imposed. For example, while PoVs may be allowed to span multiple Megabytes, a PrePoV may be limited to just a few kilobytes.
This is a new function on the Parachain Validation code which has this approximate signature
fn pre_validate(
parachain_parent_head_data,
new_para_head_hash,
relay_parent_number,
relay_parent_state_root,
PrePoV,
) -> Result<CollatorId, Bad>
We introduce the notion of reasonable uniqueness. For a PreVF to function effectively, its result must be reasonably unique. Signatures by a specific validator are reasonably unique so long as equivocations are punished ecnomically. Proofs of Work are reasonably unique if they are sufficiently difficult. And so on. It's allowed to occasionally have collisions in the PreVF, but the multiplicity should be low even if so.
With PreVFs, collators can send over (CandidateReceipt, PrePoV)
pairs. The PrePoV can be used by validators to determine preliminary validity.
Note that we don't rely on PreVFs for security of any kind. A Parachain exposing a PreVF which is not plausibly unique only detracts from the liveness of the parachain, as it can only be used to force validators to waste bandwidth and fail to back any candidate for the parachain.
Torrenting PoVs
We introduce a new request type:
struct Request { pov_hash, index };
struct Response { merkle_proof, chunk_data };
We introduce a new gossip statement, to be used exclusively among backing validators:
IntentToRecoverSeconded(CandidateHash, PovTrieCommitment);
IntentToRecover(CandidateReceipt, PovTrieCommitment, PrePoV);
Stage 1: Torrenting among validator groups
Stage 1 can be implemented before PreVFs. At this point, a seconding validator still needs to download the full PoV from a collator and check it before seconding. However, after issuing the Seconded
message, the validator can issue an IntentToRecover(seconded, trie_commitment)
and the other validators in the group can also signal an intent to recover the data and will recover it from the seconding validator as well as each other.
We have a few pieces of information that can be used to determine the chunk range that validators initially download, to maximize initial coverage:
- Each validator's index in the backing group
- The amount of validators in the backing group
- The overall size of the PoV
- The fact that a majority opinion is needed within the backing group
Validators can choose the initial ranges of chunks they download based on this information, to achieve a desirable balance of redundancy and availability.
Stage 2: Torrenting before seconding
PreVFs will enable validators to recover and torrent a PoV among all validators in the group as well as potentially from the collator even before checking the candidate. There should be a limit of 2 or 3 recoveries at a time, and due to the condition of reasonable uniqueness we expect on PreVFs, this should not affect throughput.
A validator, having checked a PreVF, can distribute the PreVF to other validators in the group, and all begin torrenting chunks from each other as well as the initial collator. The same basic principles for choosing chunk ranges to fetch apply, with the addition of potential rules for collators to follow about which chunks to prioritize serving which validators.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status