Skip to content

Added crypto #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 12, 2020
Merged

Added crypto #5

merged 2 commits into from
May 12, 2020

Conversation

That3Percent
Copy link
Contributor

This PR introduces a cryptographically strong StableHash implementation suitable for PoI v2. Cryptographically strong in this context means:

  • Pre-image resistant (It is impossible to manufacture collisions, even with a maliciously crafted subgraph and indexer working in tandem)
  • No information leak (Given a StableHash, schema, and even some of the members of the multiset hash it is not possible to create new derived hashes with known members removed or modified)

In this implementation, the SequenceNumber uniquely identifies a database cell. The cell id and the payload are concatenated and hashed together. Each output hash being is fed into a multi-set aggregation function.

Interestingly this implementation allows us to incrementally update a StableHash for all blocks in a way that would be the same as if it was a hash of all blocks and events up to that point. Eg: The final stable hash will be the same as the hash of the huge struct:

struct PoI {
   blocks: Vec<Block>,
}
struct Block {
   events: Vec<PoIEvent>,
}

In this way, all block heights have a unique hash (because they contain some number of empty blocks). Yet it is very performant to add data because nothing actually needs to be written for empty blocks even though they affect the result. All that needs to happen at request time is to mix in the block count to close off the PoI::blocks vec.

The non-cryptographic stable hash implementations remain as a part of this crate.

The traits had to be slightly altered so that any StableHasher could know the internals for it's compatible SequenceNumber implementation - moving SequenceNumber into an associated type.

Also, the implementation of the unordered hash had to be slightly altered to fit both the crypto and non-crypto implementations. This changed the output for any existing value which uses a non-ordered hash (eg: HashMap).

@That3Percent That3Percent requested review from jacksonblazensky, a team and leoyvens May 11, 2020 22:28
Copy link

@jacksonblazensky jacksonblazensky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions. Other than that, everything looks good to me. It seems like the changes didn't require a total overhaul or anything – they fit right in like you mentioned. Great work!

/// identify parts within the struct. Conveniently, the SequenceNumber::skip
/// method can be used to jump to parts of a vec or struct efficiently.
pub struct SetHasher {
// TODO: (Performance). We want an int 2056 + 2048 = 4104 bit int.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that you plan to implement a crate / traits that can easily give 4104 bit ints, and once this is done, the set hash implementation should be more performant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The current choice num-bigint::BigUint requires multiple variable-length heap allocations for every mixin. It is possible to have a large int be allocated on the stack for a quick and easy performance gain.

type Bytes = [u8; $size];
#[inline(always)]
fn prime_init() -> Self {
17

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this initial prime 17 and the initial number 486,187,739 (line 77) come from? I don't recall them from the MSet-Mu-Hash paper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used exclusively in the non-cryptographic variant. They were found by exhaustive search of 32 bit values to be the least likely to produce collisions.

Copy link

@leoyvens leoyvens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few comments about Rust style, which is the part of this I can speak to.

fn root() -> Self {
Self {
hasher: Hasher::new(),
child: unsafe { NonZeroUsize::new_unchecked(1) },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer NonZeroUsize::new(1).unwrap() to avoid the unsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for verifying the assembly comes out the same offline! Coming from C# before this, the Rust compiler always amazes me at just how much less one has to worry all the time.

write_varint(&mut hasher, child.get().try_into().unwrap()).unwrap();
Self {
hasher,
child: unsafe { NonZeroUsize::new_unchecked(1) },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

let rollup = self
.rollup
.wrapping_mul(T::prime_mult())
.wrapping_add(child.try_into().unwrap_or_else(|_| panic!("Overflow")));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unwrap_or_else instead of just unwrap, considering we are not adding any information in the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap would require that the trait UInt's TryFrom<usize> have the bound that TryFrom<usize, Error=::std::fmt::Debug>. This isn't legal Rust today. So, the only way to get that trait bound would be to pollute a bunch of common method signatures with where clauses which I very much wish to avoid.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

src/utils.rs Outdated

pub struct XorAggregator<T> {
value: u64,
_marker: std::marker::PhantomData<*const T>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not PhantomData<T> here? Also PhantomData is imported so you don't need to qualify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhantomData<T> would infer that XorAggregator logically "owns" a T. But, the code only uses T to construct ephemeral values.

From the docs for PhantomData:

If your struct does not in fact own the data of type T, it is better to use a reference type, like PhantomData<&'a T> (ideally) or PhantomData<*const T> (if no lifetime applies), so as not to indicate ownership.

Here, T does not factor into the drop check analysis. Some methods construct a T on the stack and use the value temporarily, but no ownership or lifetimes apply to the "containing" struct.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I went into this obscure corner of the language and that doc may be outdated rust-lang/rust#70841 and there is a controversy I'm not even close to understanding around it. Anyways just a curiosity, nothing wrong with following the current docs here.

pub struct StableHasherWrapper<T>(T);
pub struct StableHasherWrapper<H, Seq> {
hasher: H,
_marker: PhantomData<*const Seq>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not PhantomData<Seq>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@That3Percent
Copy link
Contributor Author

@leoyvens I believe I've addressed all your comments. Care to update the review?

@That3Percent That3Percent merged commit 722ea32 into master May 12, 2020
@That3Percent That3Percent deleted the crypto branch July 1, 2021 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants