Concurrent map interface

We need to figure out a safe and useful interface for concurrent maps that all implementations in crossbeam would then provide. This is my proposal of how it could look.

First, the basic functions mirroring the ones in the standard `HashMap`:
```rust
fn insert(&self, k: K, v: V) -> Option<V>;
fn contains_key<Q: ?Sized>(&self, k: &Q) -> bool where K: Borrow<Q>, Q: Hash + Eq;
fn remove<Q: ?Sized>(&self, k: &Q) -> Option<V> where K: Borrow<Q>, Q: Hash + Eq;
fn clear(&self);
fn len(&self) -> usize;
fn is_empty(&self) -> bool;
```
All methods obviously take an immutable `&self`.

Notice that there is no element access. The standard `get` returns a borrow `Option<&V>` and we cannot ever safely provide that since other threads may remove it in the meantime. Now, it is possible to provide a function like:
```rust
fn borrow<Q: ?Sized>(&self, k: &Q) -> Option<Guard<V>> where K: Borrow<Q>, Q: Hash + Eq;
```
where `Guard<V>` would provide immutable borrows of the value and maybe a possibility to mutate it (a somewhat similar idea to `Entry`). I can think of two ways to do this:
The first would be to make `Guard` pin the epoch on creation and unpin it on destruction. However, since leaking destructors is safe, leaking a _single_ `Guard` would disallow recollection of any memory from the pinning thread since the pin would stay on forever. (The same is true of `mem::epoch::Guard`, but we expect to only use it inside algorithms which are highly scrutinized due to the unsafe `unlinked` calls.) This is therefore most likely a terrible idea.
The second way would be to use reference counting (probably `Arc`) internally. This is pointless, since instead we can provide the following interface:
```rust
fn find<Q: ?Sized>(&self, k: &Q) -> Option<V> where K: Borrow<Q>, Q: Hash + Eq, V: Clone;
```
I used a different function name to signify the difference from single-threaded maps.
The user can then just set `V` to `Arc<T>` and have shared access to `T`. Hence, returning `Guard`s only makes sense if it can be done without pinning the epoch or reference counting __and__ in a faster way.

One thing that's neat and easily doable here is (@DemiMarie's idea) a closure that operates on the borrow:
```rust
fn with<Q: ?Sized, F>(&self, k: &Q, f: F)
  where K: Borrow<Q>,
        Q: Hash + Eq,
        F: FnOnce(Option<&V>);
...
map.with("key", |opt| println!("is key present? {}", opt.is_some()); );
```
Internally this would pin the epoch, avoiding the problem I outlined above by not allowing arbitrary epoch guard leakage.

Then there are the various iterator APIs: `Values`, `Keys,` `Iter`, `Drain`. Providing them would give the concurrent map great expressive power. However, there is the _small_ problem of everything being able to change concurrently, invalidating iterators. I think the smartest move here is to gather information on what would be actually useful and viable in this context instead of mirroring the standard library.

Finally, there is the important problem of atomic transactions. In a single-threaded map a user might do something like:
```rust
// Guarantees that 0 maps to "zero" iff it didn't already map to something else.
if (!map.contains_key(0)) {
  map.insert(0, "zero");
}
```
For concurrent maps, doing this would require wrapping the structure in something like `Mutex` to guarantee that two threads don't both modify the value. This of course nullifies the whole gain from lock-free algorithms.
Now, we might provide functions for often-used operations like the above one, maybe
```rust
/// Inserts a key-value pair iff the key wasn't observed in the structure.
/// Returns `true` if it succeeds, `false` otherwise.
fn insert_new(&self, key: K, val: V) -> bool;
```
In general though, it's inviable to implement every single thing people might need. There is, however, [a promising approach](https://dl.acm.org/citation.cfm?id=2935780) (unfortunately paywalled) to this problem that works for list-based implementations.

It would be great to then provide a functional `Combinator` API that would aggregate arbitrary operations like `Iterator`s do and execute them in a single atomic transaction on destruction, maybe like so:
```rust
map.combine().contains_or(0, |access| { access.insert(0, "zero"); } );
```
This would still be somewhat limited to what can be called on the `Combinator` and complex boolean conditions might or might not be expressible, but it's a beginning. It's definitely possible to think of something much nicer, but in the end it will always be a bit awkward to use, since to execute the transaction we need knowledge of each and every its invariant. Procedural macros might help here, but internally will still expand to long function chains. I would be happy to be proven wrong here.

Lastly, it may be worth it to have a generic trait, something like `ConcurrentMap<K, V>` expressing such an interface. Notice however, that the atomic transaction implementation I linked above only applies to linked structures. Hash maps could use STM or such, but I'm unsure whether it's a good idea to force them to provide the transaction API (depending on whether it impacts the performance of the whole structure, not just the relevant methods). We could also have similar interfaces for other structures such as queues, as @schets suggests in #56.

I would be happy to hear any feedback you might have on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent map interface #109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Concurrent map interface #109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions