Conversation
|
Nice work! I'm quite busy this week and won't have time to perform a full review, however I will try to find some time next for it. |
Amanieu
left a comment
There was a problem hiding this comment.
Here's an initial review with some comments. I will do a further review later.
src/hash_table/chaining.rs
Outdated
| pub use crate::hash_table::ChainedLoadFactor; | ||
| use crate::key_adapter::KeyAdapter; | ||
| use crate::linked_list::Link as LinkedListLink; | ||
| use crate::linked_list::LinkedList; |
There was a problem hiding this comment.
You really should be using singly-linked lists here. The memory savings are significant and nobody really cares about iterating through hash tables backwards.
There was a problem hiding this comment.
I had originally planned for O(1) inserts and removals. If I use singly-linked lists, the constant time remove operation is lost.
I had also wanted a general purpose implementation. Currently the max load factor is not constrained; the user is free to create hash tables where the number of buckets is small relative to the number of items. Linear time removals could affect this kind of use case.
I have a use case where link: LinkedListLink is reused between a double-ended-queue and a hash table to save memory. I'll need an extra pointer if I use singly-linked lists, unless a LinkedListLink can be used as a SinglyLinkedListLink.
I think the solution is two chaining hash table implementations:
Which names do you like?
- singly-linked list
SinglyLinkedChainedHashTableSinglyChainedHashTableSinglyLinkedHashTable
- linked list
LinkedChainedHashTableChainedHashTableLinkedHashTable
There was a problem hiding this comment.
O(1) removal can be obtained by having cursors point to the previous element in the linked list.
Sharing a link between a linked list and a hash table can be solved by #27 which would allow you to implement your own Link types and make them work for different collections.
src/hash_table/chaining.rs
Outdated
| buckets: B, | ||
| buckets_bits: C, | ||
| buckets_len: usize, | ||
| hash_builder: S, |
There was a problem hiding this comment.
As I mentioned above, it is probably better to leave the choice of hasher to the adapter. As a bonus this eliminates our dependency on std.
|
I would like to make one major change regarding the trait Array<T>: DerefMut<[T]> {
/// Hint to the array that the hash table is about to insert additional elements.
/// This function can return a new array to tell the hash table to rehash into the new array.
fn reserve(&mut self, len: usize) -> Option<Self>;
/// Hint to shrink the capacity of the array if possible.
/// This function can return a new array to tell the hash table to rehash into the new array.
fn shrink_to_fit(&mut self, len: usize) -> Option<Self>;
}The key change here is that the The implementations of For dynamic arrays which track the load factor we will provide a |
|
Yes, I plan on finishing this later this week. |
f5d36dc to
ba60b09
Compare
|
I've been ruminating about the design and came to the conclusion that it is much cleaner to treat This means that I wanted to get the current design reviewed. Also, test coverage is 93.2%, so check the tests for example usage. Outstanding issues
Some notesThere are now two kinds of cursors:
|
Amanieu
left a comment
There was a problem hiding this comment.
Initial review, I haven't finished reading through all the code yet.
| /// This function can return a new array to tell the hash table to rehash into the new array. | ||
| fn shrink_to(&mut self, min_capacity: usize) -> Option<Self>; | ||
|
|
||
| /// Returns the capacity. |
There was a problem hiding this comment.
| /// Returns the capacity. | |
| /// Returns the number of elements the map can hold without reallocating. | |
| /// | |
| /// This number is a lower bound; the hash table might be able to hold | |
| /// more, but is guaranteed to be able to hold at least this many. |
| } | ||
|
|
||
| impl<T> Array<T> for &'_ mut [T] { | ||
| #[inline(never)] |
| inner: Vec<T>, | ||
| capacity: usize, | ||
| load_factor: LoadFactor, | ||
| len: usize, |
| impl<T: Default> Array<T> for DynamicArray<T> { | ||
| #[inline(never)] | ||
| fn reserve(&mut self, new_capacity: usize) -> Option<Self> { | ||
| let new_len = new_capacity; |
There was a problem hiding this comment.
Just rename the parameter to new_len?
| 10 11 12 13 14 15 16 17 18 19 | ||
| 20 21 22 23 24 25 26 27 28 29 | ||
| 30 31 32 | ||
| } |
There was a problem hiding this comment.
I would prefer restricting array lengths to only powers of two. This has several advantages:
- Bit masking is much faster than modulo.
- We only need to provide array impls for powers of two.
- Growing is easy: just double the size.
You can use the capacity_to_buckets and bucket_mask_to_capacity methods from hashbrown as a reference.
| extern crate alloc; | ||
|
|
||
| #[cfg(test)] | ||
| #[macro_use] |
There was a problem hiding this comment.
You should import macros using use std::<macro name>; rather than using #[macro_use].
| #![warn(rust_2018_idioms)] | ||
| #![no_std] | ||
| #![cfg_attr(feature = "nightly", feature(const_fn))] | ||
| #![cfg_attr(feature = "nightly", feature(const_fn, const_generics))] |
There was a problem hiding this comment.
Const generics aren't needed if we are only providing array impls for power-of-two sizes.
| /// |`remove`|`O(n)`|`O(1)`| | ||
| /// |`replace_next_with`|`O(1)`|`O(1)`| | ||
| /// |`replace_with`|`O(n)`|`O(1)`| | ||
| pub unsafe trait BucketOps { |
There was a problem hiding this comment.
We can simplify this: just implement BucketOps directly on LinkedList/SinglyLinkedList instead of using DefaultBucketOps. This functionality isn't really something we need to let the user customize.
Remove type Bucket and type PointerOps. Add type Pointer and type Value.
| /// It is also possible to create stateful adapters. | ||
| /// This allows links and containers to be separated and avoids the need for objects to be modified to | ||
| /// contain a link. | ||
| pub trait HashTableAdapter { |
There was a problem hiding this comment.
HashTableAdapter can be simplified. Remove PointerOps and BucketOps, the adapter only needs to handle key equality and key hashing. Everything else is handled by the BucketOps from the B array parameter to HashTable.
Intrusive hash table implementation that supports:
Adapter<Link = LinkedListLink>)bucket usage bitmapEntryAPIThere are a few utility traits:
Arrayfor implementing the bucket index and the metadata.DynamicArray: Arrayfor implementing dynamic allocation / resizing.BitsBitsArrayDynamic allocation is entirely optional and can be implemented via the
DynamicArraytrait.