-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[zstd][cli] Add performance counters support to bench mode #4354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Adding an extra parameter while running in benchmark mode to allow collecting processor performance counters, as that will allow next to know performance stats per operation (i.e. compression vs decompression). We can collect the following performance counters using the Linux perf API: CPU cycles, instructions, branch misses, cache hits and cache misses. One advantage of leveraging the Linux perf API is that it should work on any processor that runs Linux, therefore should work fine on x86-64 (Intel and AMD), Arm (arm32/aarch64) and RISC-V. The counters will allow to generate interesting stats like cycles/byte, a measure that is helpful to compare different CPU micro architectures with the benefit of being independent of clock speed. Plus, any I/O operations (i.e. reading files from the disk) that will waste cycles displayed in a regular 'perf stat' will *not* be counted, since we only capture counters during the main benchmark loop. This patch is still in its early stages as the idea is to listen to feedback and properly address its current shortcommings to progress towards a contribution that can be landed on zstd.
Runnning with the help flag should print this: Compress or decompress the INPUT file(s); reads from STDIN if INPUT is Usage: zstd [OPTIONS...] [INPUT... | -] [-o OUTPUT] |
Two examples when the flag is enabled: a) Synthetic: Perf cycles: 326893971910 -> 3239077 (x3.087), 487.4 MB/s, 2636.7 MB/s1# b) With file input: Perf cycles: 427627890230 -> 190860020 (x5.017), 851.1 MB/s, 2906.7 MB/s1# |
The basic idea is to add into the benchmark mode a way to know more precisely the CPU stats operations (e.g. compression vs decompression), remove from the equation cycles spent on I/O and allow to calculate some extra stats (e.g. cycles/byte). |
If this is a feature that could be helpful to zstd, I can further develop the patch to get into a "land-able" state. This is just an early draft with the basic idea, a PoC (Proof of Concept). |
I considered using the RDPMC instruction, but its behavior is different between x86-64 implementations (i.e. Intel vs AMD), plus it would be x86-64 only. On the other hand, it may be possible to collect some extra counters not available using the Linux perf API. @Cyan4973 thoughts? |
I believe this is a good topic. |
** NOT FOR LANDING**
Adding an extra parameter (-y) while running in benchmark mode to allow collecting processor performance counters, as that will allow next to know performance stats per operation (i.e. compression vs decompression).
We can collect the following performance counters using the Linux perf API: CPU cycles, instructions, branch misses, cache hits and cache misses.
One advantage of leveraging the Linux perf API is that it should work on any processor that runs Linux, therefore should work fine on x86-64 (Intel and AMD), Arm (arm32/aarch64) and RISC-V.
The counters will allow to generate new interesting stats like cycles/byte, a measure that is helpful to compare different CPU micro architectures with the benefit of being independent of clock speed.
Plus, any I/O operations (i.e. reading files from the disk) that will waste cycles displayed in a regular 'perf stat' will not be counted, since we only capture counters during the main benchmark loop.
This patch is still in its early stages as the idea is to listen to feedback and properly address its current short comings to progress towards a contribution that can be landed on zstd.