|
1 | 1 | # cuckoo-filter |
2 | 2 | cuckoo-filter go implement. Custom by you |
| 3 | + |
| 4 | +transplant from [efficient/cuckoofilter](https://github.com/efficient/cuckoofilter) |
| 5 | + |
| 6 | +[中文文档](./README_ZH.md) |
| 7 | + |
| 8 | +Overview |
| 9 | +-------- |
| 10 | +Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. |
| 11 | + |
| 12 | +Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%). |
| 13 | + |
| 14 | +For details about the algorithm and citations please use: |
| 15 | + |
| 16 | +["Cuckoo Filter: Practically Better Than Bloom"](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) in proceedings of ACM CoNEXT 2014 by Bin Fan, Dave Andersen and Michael Kaminsky |
| 17 | + |
| 18 | +## Implementation details |
| 19 | + |
| 20 | +The paper cited above leaves several parameters to choose. |
| 21 | + |
| 22 | +2. Bucket size(b): Number of fingerprints per bucket |
| 23 | +3. Fingerprints size(f): Fingerprints bits size of hashtag |
| 24 | + |
| 25 | +In other implementation: |
| 26 | + |
| 27 | +- [seiflotfy/cuckoofilter](https://github.com/seiflotfy/cuckoofilter) use b=4, f=8 bit, which correspond to a false positive rate of `r ~= 0.03`. |
| 28 | +- [panmari/cuckoofilter](https://github.com/panmari/cuckoofilter) use b=4, f=16 bit, which correspond to a false positive rate of `r ~= 0.0001`. |
| 29 | +- [irfansharif/cfilter](https://github.com/irfansharif/cfilter) can adjust b and f, but only can adjust f to 8x, which means it is in Bytes. |
| 30 | + |
| 31 | +In this implementation, you can adjust b and f to any value you want, and the Semi-sorting Buckets mentioned in paper is also avaliable, which can save 1 bit per item. |
| 32 | + |
| 33 | +##### Why custom is important? |
| 34 | + |
| 35 | +According to paper |
| 36 | + |
| 37 | +- Different bucket size result in different filter loadfactor, which means occupancy rate of filter |
| 38 | +- Different bucket size is suitable for different target false positive rate |
| 39 | +- To keep a false positive rate, bigger bucket size, bigger fingerprint size |
| 40 | + |
| 41 | + Given a target false positive rate of `r` |
| 42 | + |
| 43 | +> when r > 0.002, having two entries per bucket yields slightly better results than using four entries per bucket; when decreases to 0.00001 < r ≤ 0.002, four entries per bucket minimizes space. |
| 44 | +
|
| 45 | +with a bucket size `b`, they suggest choosing the fingerprint size `f` using |
| 46 | + |
| 47 | + f >= log2(2b/r) bits |
| 48 | + |
| 49 | +as the same time, notice that we got loadfactor 84%, 95% or 98% when using bucket size b = 2, 4 or 8 |
| 50 | + |
| 51 | +##### To know more about parameter choosing, refer to paper's section 5 |
| 52 | + |
| 53 | +Note: generally b = 8 is enough, without more data support, we suggest you choosing b from 2, 4 or 8. And f is max 32 bits |
| 54 | + |
| 55 | +## Example usage: |
| 56 | + |
| 57 | +``` go |
| 58 | +package main |
| 59 | + |
| 60 | +import ( |
| 61 | + "fmt" |
| 62 | + "github.com/linvon/cuckoo-filter" |
| 63 | +) |
| 64 | + |
| 65 | +func main() { |
| 66 | + cf := cuckoo.NewFilter(4, 9, 3900, cuckoo.TableTypePacked) |
| 67 | + fmt.Println(cf.Info()) |
| 68 | + fmt.Println(cf.FalsePositiveRate()) |
| 69 | + |
| 70 | + a := []byte("A") |
| 71 | + cf.Add(a) |
| 72 | + fmt.Println(cf.Contain(a)) |
| 73 | + fmt.Println(cf.Size()) |
| 74 | + |
| 75 | + b := cf.Encode() |
| 76 | + ncf, _ := cuckoo.Decode(b) |
| 77 | + fmt.Println(ncf.Contain(a)) |
| 78 | + |
| 79 | + cf.Delete(a) |
| 80 | + fmt.Println(cf.Size()) |
| 81 | +} |
| 82 | +``` |
| 83 | + |
0 commit comments