We need to figure out a safe and useful interface for concurrent maps that all implementations in crossbeam would then provide. This is my proposal of how it could look.
First, the basic functions mirroring the ones in the standard HashMap:
fn insert(&self, k: K, v: V) -> Option<V>;
fn contains_key<Q: ?Sized>(&self, k: &Q) -> bool where K: Borrow<Q>, Q: Hash + Eq;
fn remove<Q: ?Sized>(&self, k: &Q) -> Option<V> where K: Borrow<Q>, Q: Hash + Eq;
fn clear(&self);
fn len(&self) -> usize;
fn is_empty(&self) -> bool;
All methods obviously take an immutable &self.
Notice that there is no element access. The standard get returns a borrow Option<&V> and we cannot ever safely provide that since other threads may remove it in the meantime. Now, it is possible to provide a function like:
fn borrow<Q: ?Sized>(&self, k: &Q) -> Option<Guard<V>> where K: Borrow<Q>, Q: Hash + Eq;
where Guard<V> would provide immutable borrows of the value and maybe a possibility to mutate it (a somewhat similar idea to Entry). I can think of two ways to do this:
The first would be to make Guard pin the epoch on creation and unpin it on destruction. However, since leaking destructors is safe, leaking a single Guard would disallow recollection of any memory from the pinning thread since the pin would stay on forever. (The same is true of mem::epoch::Guard, but we expect to only use it inside algorithms which are highly scrutinized due to the unsafe unlinked calls.) This is therefore most likely a terrible idea.
The second way would be to use reference counting (probably Arc) internally. This is pointless, since instead we can provide the following interface:
fn find<Q: ?Sized>(&self, k: &Q) -> Option<V> where K: Borrow<Q>, Q: Hash + Eq, V: Clone;
I used a different function name to signify the difference from single-threaded maps.
The user can then just set V to Arc<T> and have shared access to T. Hence, returning Guards only makes sense if it can be done without pinning the epoch or reference counting and in a faster way.
One thing that's neat and easily doable here is (@DemiMarie's idea) a closure that operates on the borrow:
fn with<Q: ?Sized, F>(&self, k: &Q, f: F)
where K: Borrow<Q>,
Q: Hash + Eq,
F: FnOnce(Option<&V>);
...
map.with("key", |opt| println!("is key present? {}", opt.is_some()); );
Internally this would pin the epoch, avoiding the problem I outlined above by not allowing arbitrary epoch guard leakage.
Then there are the various iterator APIs: Values, Keys, Iter, Drain. Providing them would give the concurrent map great expressive power. However, there is the small problem of everything being able to change concurrently, invalidating iterators. I think the smartest move here is to gather information on what would be actually useful and viable in this context instead of mirroring the standard library.
Finally, there is the important problem of atomic transactions. In a single-threaded map a user might do something like:
// Guarantees that 0 maps to "zero" iff it didn't already map to something else.
if (!map.contains_key(0)) {
map.insert(0, "zero");
}
For concurrent maps, doing this would require wrapping the structure in something like Mutex to guarantee that two threads don't both modify the value. This of course nullifies the whole gain from lock-free algorithms.
Now, we might provide functions for often-used operations like the above one, maybe
/// Inserts a key-value pair iff the key wasn't observed in the structure.
/// Returns `true` if it succeeds, `false` otherwise.
fn insert_new(&self, key: K, val: V) -> bool;
In general though, it's inviable to implement every single thing people might need. There is, however, a promising approach (unfortunately paywalled) to this problem that works for list-based implementations.
It would be great to then provide a functional Combinator API that would aggregate arbitrary operations like Iterators do and execute them in a single atomic transaction on destruction, maybe like so:
map.combine().contains_or(0, |access| { access.insert(0, "zero"); } );
This would still be somewhat limited to what can be called on the Combinator and complex boolean conditions might or might not be expressible, but it's a beginning. It's definitely possible to think of something much nicer, but in the end it will always be a bit awkward to use, since to execute the transaction we need knowledge of each and every its invariant. Procedural macros might help here, but internally will still expand to long function chains. I would be happy to be proven wrong here.
Lastly, it may be worth it to have a generic trait, something like ConcurrentMap<K, V> expressing such an interface. Notice however, that the atomic transaction implementation I linked above only applies to linked structures. Hash maps could use STM or such, but I'm unsure whether it's a good idea to force them to provide the transaction API (depending on whether it impacts the performance of the whole structure, not just the relevant methods). We could also have similar interfaces for other structures such as queues, as @schets suggests in #56.
I would be happy to hear any feedback you might have on this.
We need to figure out a safe and useful interface for concurrent maps that all implementations in crossbeam would then provide. This is my proposal of how it could look.
First, the basic functions mirroring the ones in the standard
HashMap:All methods obviously take an immutable
&self.Notice that there is no element access. The standard
getreturns a borrowOption<&V>and we cannot ever safely provide that since other threads may remove it in the meantime. Now, it is possible to provide a function like:where
Guard<V>would provide immutable borrows of the value and maybe a possibility to mutate it (a somewhat similar idea toEntry). I can think of two ways to do this:The first would be to make
Guardpin the epoch on creation and unpin it on destruction. However, since leaking destructors is safe, leaking a singleGuardwould disallow recollection of any memory from the pinning thread since the pin would stay on forever. (The same is true ofmem::epoch::Guard, but we expect to only use it inside algorithms which are highly scrutinized due to the unsafeunlinkedcalls.) This is therefore most likely a terrible idea.The second way would be to use reference counting (probably
Arc) internally. This is pointless, since instead we can provide the following interface:I used a different function name to signify the difference from single-threaded maps.
The user can then just set
VtoArc<T>and have shared access toT. Hence, returningGuards only makes sense if it can be done without pinning the epoch or reference counting and in a faster way.One thing that's neat and easily doable here is (@DemiMarie's idea) a closure that operates on the borrow:
Internally this would pin the epoch, avoiding the problem I outlined above by not allowing arbitrary epoch guard leakage.
Then there are the various iterator APIs:
Values,Keys,Iter,Drain. Providing them would give the concurrent map great expressive power. However, there is the small problem of everything being able to change concurrently, invalidating iterators. I think the smartest move here is to gather information on what would be actually useful and viable in this context instead of mirroring the standard library.Finally, there is the important problem of atomic transactions. In a single-threaded map a user might do something like:
For concurrent maps, doing this would require wrapping the structure in something like
Mutexto guarantee that two threads don't both modify the value. This of course nullifies the whole gain from lock-free algorithms.Now, we might provide functions for often-used operations like the above one, maybe
In general though, it's inviable to implement every single thing people might need. There is, however, a promising approach (unfortunately paywalled) to this problem that works for list-based implementations.
It would be great to then provide a functional
CombinatorAPI that would aggregate arbitrary operations likeIterators do and execute them in a single atomic transaction on destruction, maybe like so:This would still be somewhat limited to what can be called on the
Combinatorand complex boolean conditions might or might not be expressible, but it's a beginning. It's definitely possible to think of something much nicer, but in the end it will always be a bit awkward to use, since to execute the transaction we need knowledge of each and every its invariant. Procedural macros might help here, but internally will still expand to long function chains. I would be happy to be proven wrong here.Lastly, it may be worth it to have a generic trait, something like
ConcurrentMap<K, V>expressing such an interface. Notice however, that the atomic transaction implementation I linked above only applies to linked structures. Hash maps could use STM or such, but I'm unsure whether it's a good idea to force them to provide the transaction API (depending on whether it impacts the performance of the whole structure, not just the relevant methods). We could also have similar interfaces for other structures such as queues, as @schets suggests in #56.I would be happy to hear any feedback you might have on this.