Skip to content

Commit c31590a

Browse files
lewisclementErikSchierboom
authored andcommitted
Extract Concepts from v2 exercise parallel-letter-frequency
Closes exercism#253 * Create parallel-letter-frequency.md Initial list with concept level abstraction of common elements used for solving `parallel-letter-frequency`. * Update with specific concurrency concepts separate To-do: implement an example for each concurrency concept
1 parent 2f8bc34 commit c31590a

File tree

1 file changed

+147
-0
lines changed

1 file changed

+147
-0
lines changed
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Concepts of Parallel letter frequency
2+
3+
This exercise boasts a wide variety of possible solutions in combination with the help of external crates. For keeping the amount of work manageable this document contains a small selection of possibilities. Crossbeam solutions have a similar set of concepts as listed here, and Rayon requires only a subset.
4+
5+
## Summary
6+
- **Primitives**
7+
- `usize`
8+
- `char`
9+
- floating point values
10+
- Dividing floating points vs dividing integers
11+
- **Immutability/explicit mutability**
12+
- **Strings**
13+
- `&str` type
14+
- **Unicode**
15+
- Unicode methods vs ascii methods
16+
- **Slices**
17+
- **Tuples**
18+
- **Destructuring**
19+
- **Functions**
20+
- Higher order functions
21+
- Closures
22+
- **Visibility**
23+
- **Data collections**
24+
- `HashMap`
25+
- **`Option` type**
26+
- **Iterators**
27+
- Lazy evaluation
28+
- Consuming/non-consuming
29+
- As per the second example: `FromIterator` trait
30+
- **For loop**
31+
- **References**
32+
- Reference counters (`Arc` in this situation)
33+
- Dereferencing
34+
- **Lifetimes**
35+
- `'Static`
36+
- `move`
37+
- Dropping
38+
- **Crates**
39+
- **Parallelism**
40+
- Spawning and joining
41+
#### Concepts related to parallelism
42+
- **Channels**
43+
- **`Mutex` and `RwLock`***
44+
- **Futures**
45+
- **Reference counting**
46+
47+
*<sub>*Related but here not applicable for reasonable solutions</sub>*
48+
49+
## General
50+
- Strings: specifically the `str` is always a factor here
51+
- Unicode: performance/feature concessions of ascii vs unicode methods. Benchmarking is skewed when benchmark uses unicode methods and the solution uses ascii methods.
52+
- Iterators: iterating over input or chunks of it.
53+
- Collections: Iterating over a `HashMap` and using `Entry`. Commonly handles are collected into `Vec`.
54+
- References
55+
- Numbers: `Add` trait specifically
56+
- For loop: can't join threads in an Iterator because of move.
57+
- **Opt** Visibility: keep implementation details non-public
58+
- **Opt** slices: `.chunks()` is a often used method for splitting up input for worker threads. Alternatively subslices are used.
59+
- **Opt** Higher-order functions: the HashMaps can be merged by a higher-order function with an Iterator.
60+
- **Opt** Types: for dividing input we can cast integers to floating point values for applying rounding
61+
- **Opt** Crates
62+
63+
## Std approach with or without channels
64+
- Parallelism - subtopic spawning/joining threads.
65+
- Iterators - subtopic lazily-evaluated: spawning and joining threads with the same Iterator prevents parallelism.
66+
- Lifetimes - subtopics `'static lifetime` and explicit `move` semantics
67+
- Iterators/lifetimes - subtopic consuming Iterators: threads can't be joined using Iterator.
68+
- Option type: unwrapping thread or channel results
69+
- Functions: Specifically higher order functions are required for thread spawning. Closures are an option here.
70+
71+
## Std approach with channels
72+
- Parallelism - subtopic channels.
73+
- Tuples: `mpsc::channel()` returns a Tuple
74+
- Destructuring: tuple can be destructured on assignment
75+
- Lifetimes: `drop()`
76+
77+
## Alternate approach with shared variables
78+
- Reference counting: Sharing aggregate data between threads with a reference counted collection.
79+
- **Honorable mention**: Parallelism - subtopic Mutexes. Not necessary/useful for this exercise, but a common concept nonetheless. Related: `RwLock`.
80+
81+
### Example std with channels
82+
```rust
83+
use std::collections::HashMap;
84+
use std::thread;
85+
use std::sync::mpsc;
86+
87+
pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
88+
let (tx, rx) = mpsc::channel();
89+
90+
let _handles = input.chunks((input.len() as f64 / worker_count as f64).ceil() as usize)
91+
.map(|slice| slice.concat())
92+
.map(|slice| {
93+
let tx = tx.clone();
94+
thread::spawn(move || {
95+
let mut map: HashMap<char, usize> = HashMap::new();
96+
for chr in slice.chars().filter(|c| c.is_alphabetic()) {
97+
if let Some(c) = chr.to_lowercase().next() {
98+
(*map.entry(c).or_insert(0)) += 1;
99+
};
100+
}
101+
tx.send(map).unwrap()
102+
})
103+
}).collect::<Vec<_>>();
104+
105+
drop(tx);
106+
107+
let mut result: HashMap<char, usize> = HashMap::new();
108+
for received in rx {
109+
for (c, count) in received {
110+
*result.entry(c).or_insert(0) += count;
111+
}
112+
}
113+
114+
result
115+
}
116+
```
117+
118+
### Example crate with Arc
119+
```rust
120+
use std::collections::HashMap;
121+
use std::iter::FromIterator;
122+
use std::sync::Arc;
123+
use dashmap::DashMap;
124+
125+
pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
126+
let map: Arc<DashMap<char, usize>> = Arc::new(DashMap::new());
127+
128+
let handles = input
129+
.chunks((input.len() as f64 / worker_count as f64).ceil() as usize)
130+
.map(|slice| slice.concat())
131+
.map(|slice| {
132+
let map = map.clone();
133+
std::thread::spawn(move || {
134+
for c in slice.chars().filter(|c|c.is_alphabetic())
135+
.flat_map(|c| c.to_lowercase().next()) {
136+
*map.entry(c).or_insert(0) += 1;
137+
}
138+
})
139+
}).collect::<Vec<_>>();
140+
141+
for h in handles {
142+
h.join().unwrap();
143+
}
144+
145+
map.iter().map(|x| (*x.key(), *x.value())).collect()
146+
}
147+
```

0 commit comments

Comments
 (0)