|
| 1 | +# Concepts of Parallel letter frequency |
| 2 | + |
| 3 | +This exercise boasts a wide variety of possible solutions in combination with the help of external crates. For keeping the amount of work manageable this document contains a small selection of possibilities. Crossbeam solutions have a similar set of concepts as listed here, and Rayon requires only a subset. |
| 4 | + |
| 5 | +## Summary |
| 6 | +- **Primitives** |
| 7 | + - `usize` |
| 8 | + - `char` |
| 9 | + - floating point values |
| 10 | + - Dividing floating points vs dividing integers |
| 11 | +- **Immutability/explicit mutability** |
| 12 | +- **Strings** |
| 13 | + - `&str` type |
| 14 | +- **Unicode** |
| 15 | + - Unicode methods vs ascii methods |
| 16 | +- **Slices** |
| 17 | +- **Tuples** |
| 18 | +- **Destructuring** |
| 19 | +- **Functions** |
| 20 | + - Higher order functions |
| 21 | + - Closures |
| 22 | +- **Visibility** |
| 23 | +- **Data collections** |
| 24 | + - `HashMap` |
| 25 | +- **`Option` type** |
| 26 | +- **Iterators** |
| 27 | + - Lazy evaluation |
| 28 | + - Consuming/non-consuming |
| 29 | + - As per the second example: `FromIterator` trait |
| 30 | +- **For loop** |
| 31 | +- **References** |
| 32 | + - Reference counters (`Arc` in this situation) |
| 33 | + - Dereferencing |
| 34 | +- **Lifetimes** |
| 35 | + - `'Static` |
| 36 | + - `move` |
| 37 | + - Dropping |
| 38 | +- **Crates** |
| 39 | +- **Parallelism** |
| 40 | + - Spawning and joining |
| 41 | +#### Concepts related to parallelism |
| 42 | +- **Channels** |
| 43 | +- **`Mutex` and `RwLock`*** |
| 44 | +- **Futures** |
| 45 | +- **Reference counting** |
| 46 | + |
| 47 | +*<sub>*Related but here not applicable for reasonable solutions</sub>* |
| 48 | + |
| 49 | +## General |
| 50 | +- Strings: specifically the `str` is always a factor here |
| 51 | +- Unicode: performance/feature concessions of ascii vs unicode methods. Benchmarking is skewed when benchmark uses unicode methods and the solution uses ascii methods. |
| 52 | +- Iterators: iterating over input or chunks of it. |
| 53 | +- Collections: Iterating over a `HashMap` and using `Entry`. Commonly handles are collected into `Vec`. |
| 54 | +- References |
| 55 | +- Numbers: `Add` trait specifically |
| 56 | +- For loop: can't join threads in an Iterator because of move. |
| 57 | +- **Opt** Visibility: keep implementation details non-public |
| 58 | +- **Opt** slices: `.chunks()` is a often used method for splitting up input for worker threads. Alternatively subslices are used. |
| 59 | +- **Opt** Higher-order functions: the HashMaps can be merged by a higher-order function with an Iterator. |
| 60 | +- **Opt** Types: for dividing input we can cast integers to floating point values for applying rounding |
| 61 | +- **Opt** Crates |
| 62 | + |
| 63 | +## Std approach with or without channels |
| 64 | +- Parallelism - subtopic spawning/joining threads. |
| 65 | +- Iterators - subtopic lazily-evaluated: spawning and joining threads with the same Iterator prevents parallelism. |
| 66 | +- Lifetimes - subtopics `'static lifetime` and explicit `move` semantics |
| 67 | +- Iterators/lifetimes - subtopic consuming Iterators: threads can't be joined using Iterator. |
| 68 | +- Option type: unwrapping thread or channel results |
| 69 | +- Functions: Specifically higher order functions are required for thread spawning. Closures are an option here. |
| 70 | + |
| 71 | +## Std approach with channels |
| 72 | +- Parallelism - subtopic channels. |
| 73 | +- Tuples: `mpsc::channel()` returns a Tuple |
| 74 | +- Destructuring: tuple can be destructured on assignment |
| 75 | +- Lifetimes: `drop()` |
| 76 | + |
| 77 | +## Alternate approach with shared variables |
| 78 | +- Reference counting: Sharing aggregate data between threads with a reference counted collection. |
| 79 | +- **Honorable mention**: Parallelism - subtopic Mutexes. Not necessary/useful for this exercise, but a common concept nonetheless. Related: `RwLock`. |
| 80 | + |
| 81 | +### Example std with channels |
| 82 | +```rust |
| 83 | +use std::collections::HashMap; |
| 84 | +use std::thread; |
| 85 | +use std::sync::mpsc; |
| 86 | + |
| 87 | +pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> { |
| 88 | + let (tx, rx) = mpsc::channel(); |
| 89 | + |
| 90 | + let _handles = input.chunks((input.len() as f64 / worker_count as f64).ceil() as usize) |
| 91 | + .map(|slice| slice.concat()) |
| 92 | + .map(|slice| { |
| 93 | + let tx = tx.clone(); |
| 94 | + thread::spawn(move || { |
| 95 | + let mut map: HashMap<char, usize> = HashMap::new(); |
| 96 | + for chr in slice.chars().filter(|c| c.is_alphabetic()) { |
| 97 | + if let Some(c) = chr.to_lowercase().next() { |
| 98 | + (*map.entry(c).or_insert(0)) += 1; |
| 99 | + }; |
| 100 | + } |
| 101 | + tx.send(map).unwrap() |
| 102 | + }) |
| 103 | + }).collect::<Vec<_>>(); |
| 104 | + |
| 105 | + drop(tx); |
| 106 | + |
| 107 | + let mut result: HashMap<char, usize> = HashMap::new(); |
| 108 | + for received in rx { |
| 109 | + for (c, count) in received { |
| 110 | + *result.entry(c).or_insert(0) += count; |
| 111 | + } |
| 112 | + } |
| 113 | + |
| 114 | + result |
| 115 | +} |
| 116 | +``` |
| 117 | + |
| 118 | +### Example crate with Arc |
| 119 | +```rust |
| 120 | +use std::collections::HashMap; |
| 121 | +use std::iter::FromIterator; |
| 122 | +use std::sync::Arc; |
| 123 | +use dashmap::DashMap; |
| 124 | + |
| 125 | +pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> { |
| 126 | + let map: Arc<DashMap<char, usize>> = Arc::new(DashMap::new()); |
| 127 | + |
| 128 | + let handles = input |
| 129 | + .chunks((input.len() as f64 / worker_count as f64).ceil() as usize) |
| 130 | + .map(|slice| slice.concat()) |
| 131 | + .map(|slice| { |
| 132 | + let map = map.clone(); |
| 133 | + std::thread::spawn(move || { |
| 134 | + for c in slice.chars().filter(|c|c.is_alphabetic()) |
| 135 | + .flat_map(|c| c.to_lowercase().next()) { |
| 136 | + *map.entry(c).or_insert(0) += 1; |
| 137 | + } |
| 138 | + }) |
| 139 | + }).collect::<Vec<_>>(); |
| 140 | + |
| 141 | + for h in handles { |
| 142 | + h.join().unwrap(); |
| 143 | + } |
| 144 | + |
| 145 | + map.iter().map(|x| (*x.key(), *x.value())).collect() |
| 146 | +} |
| 147 | +``` |
0 commit comments