-
Notifications
You must be signed in to change notification settings - Fork 7
Added crypto #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added crypto #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few questions. Other than that, everything looks good to me. It seems like the changes didn't require a total overhaul or anything – they fit right in like you mentioned. Great work!
/// identify parts within the struct. Conveniently, the SequenceNumber::skip | ||
/// method can be used to jump to parts of a vec or struct efficiently. | ||
pub struct SetHasher { | ||
// TODO: (Performance). We want an int 2056 + 2048 = 4104 bit int. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that you plan to implement a crate / traits that can easily give 4104 bit ints, and once this is done, the set hash implementation should be more performant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The current choice num-bigint::BigUint
requires multiple variable-length heap allocations for every mixin. It is possible to have a large int be allocated on the stack for a quick and easy performance gain.
type Bytes = [u8; $size]; | ||
#[inline(always)] | ||
fn prime_init() -> Self { | ||
17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this initial prime 17 and the initial number 486,187,739 (line 77) come from? I don't recall them from the MSet-Mu-Hash paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used exclusively in the non-cryptographic variant. They were found by exhaustive search of 32 bit values to be the least likely to produce collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few comments about Rust style, which is the part of this I can speak to.
src/crypto/blake3_sequence.rs
Outdated
fn root() -> Self { | ||
Self { | ||
hasher: Hasher::new(), | ||
child: unsafe { NonZeroUsize::new_unchecked(1) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer NonZeroUsize::new(1).unwrap()
to avoid the unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for verifying the assembly comes out the same offline! Coming from C# before this, the Rust compiler always amazes me at just how much less one has to worry all the time.
src/crypto/blake3_sequence.rs
Outdated
write_varint(&mut hasher, child.get().try_into().unwrap()).unwrap(); | ||
Self { | ||
hasher, | ||
child: unsafe { NonZeroUsize::new_unchecked(1) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
let rollup = self | ||
.rollup | ||
.wrapping_mul(T::prime_mult()) | ||
.wrapping_add(child.try_into().unwrap_or_else(|_| panic!("Overflow"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why unwrap_or_else
instead of just unwrap
, considering we are not adding any information in the message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap
would require that the trait UInt
's TryFrom<usize>
have the bound that TryFrom<usize, Error=::std::fmt::Debug>
. This isn't legal Rust today. So, the only way to get that trait bound would be to pollute a bunch of common method signatures with where
clauses which I very much wish to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
src/utils.rs
Outdated
|
||
pub struct XorAggregator<T> { | ||
value: u64, | ||
_marker: std::marker::PhantomData<*const T>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not PhantomData<T>
here? Also PhantomData
is imported so you don't need to qualify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PhantomData<T>
would infer that XorAggregator
logically "owns" a T
. But, the code only uses T
to construct ephemeral values.
From the docs for PhantomData:
If your struct does not in fact own the data of type T, it is better to use a reference type, like PhantomData<&'a T> (ideally) or PhantomData<*const T> (if no lifetime applies), so as not to indicate ownership.
Here, T
does not factor into the drop check analysis. Some methods construct a T
on the stack and use the value temporarily, but no ownership or lifetimes apply to the "containing" struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I went into this obscure corner of the language and that doc may be outdated rust-lang/rust#70841 and there is a controversy I'm not even close to understanding around it. Anyways just a curiosity, nothing wrong with following the current docs here.
pub struct StableHasherWrapper<T>(T); | ||
pub struct StableHasherWrapper<H, Seq> { | ||
hasher: H, | ||
_marker: PhantomData<*const Seq>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not PhantomData<Seq>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@leoyvens I believe I've addressed all your comments. Care to update the review? |
This PR introduces a cryptographically strong
StableHash
implementation suitable for PoI v2. Cryptographically strong in this context means:StableHash
, schema, and even some of the members of the multiset hash it is not possible to create new derived hashes with known members removed or modified)In this implementation, the
SequenceNumber
uniquely identifies a database cell. The cell id and the payload are concatenated and hashed together. Each output hash being is fed into a multi-set aggregation function.Interestingly this implementation allows us to incrementally update a
StableHash
for all blocks in a way that would be the same as if it was a hash of all blocks and events up to that point. Eg: The final stable hash will be the same as the hash of the huge struct:In this way, all block heights have a unique hash (because they contain some number of empty blocks). Yet it is very performant to add data because nothing actually needs to be written for empty blocks even though they affect the result. All that needs to happen at request time is to mix in the block count to close off the
PoI::blocks
vec.The non-cryptographic stable hash implementations remain as a part of this crate.
The traits had to be slightly altered so that any
StableHasher
could know the internals for it's compatibleSequenceNumber
implementation - movingSequenceNumber
into an associated type.Also, the implementation of the unordered hash had to be slightly altered to fit both the crypto and non-crypto implementations. This changed the output for any existing value which uses a non-ordered hash (eg:
HashMap
).