Open
Description
During benchmark, we found that the underlying summary implementation, i.e. bquant, wastes much CPU on runtime.memorymove
.
The benchmark here, https://github.com/beorn7/perks/blob/master/quantile/bench_test.go#L7-L23, cannot show the real performance since the recorded element is monotonically increasing. Thus lines containing copy
are not reached.
After using container/list
, we get a better performance with a series of random float64 inserted, which is much more similar to the real scenario.
Though using doubly-linked list introduce obj allocation, it perform double qps under micro-benchmark,
Linked List
goos: darwin
goarch: arm64
pkg: github.com/beorn7/perks/quantile
BenchmarkInsertTargetedSmallEpsilon
BenchmarkInsertTargetedSmallEpsilon-20 6934218 169.8 ns/op 96 B/op 3 allocs/op
PASS
Slice
goos: darwin
goarch: arm64
pkg: github.com/beorn7/perks/quantile
BenchmarkInsertTargetedSmallEpsilon
BenchmarkInsertTargetedSmallEpsilon-20 3421015 343.3 ns/op 0 B/op 0 allocs/op
PASS