[zstd][cli] Add performance counters support to bench mode #4354

Adenilson · 2025-03-29T23:33:51Z

** NOT FOR LANDING**

Adding an extra parameter (-y) while running in benchmark mode to allow collecting processor performance counters, as that will allow next to know performance stats per operation (i.e. compression vs decompression).

We can collect the following performance counters using the Linux perf API: CPU cycles, instructions, branch misses, cache hits and cache misses.

One advantage of leveraging the Linux perf API is that it should work on any processor that runs Linux, therefore should work fine on x86-64 (Intel and AMD), Arm (arm32/aarch64) and RISC-V.

The counters will allow to generate new interesting stats like cycles/byte, a measure that is helpful to compare different CPU micro architectures with the benefit of being independent of clock speed.

Plus, any I/O operations (i.e. reading files from the disk) that will waste cycles displayed in a regular 'perf stat' will not be counted, since we only capture counters during the main benchmark loop.

This patch is still in its early stages as the idea is to listen to feedback and properly address its current short comings to progress towards a contribution that can be landed on zstd.

Adding an extra parameter while running in benchmark mode to allow collecting processor performance counters, as that will allow next to know performance stats per operation (i.e. compression vs decompression). We can collect the following performance counters using the Linux perf API: CPU cycles, instructions, branch misses, cache hits and cache misses. One advantage of leveraging the Linux perf API is that it should work on any processor that runs Linux, therefore should work fine on x86-64 (Intel and AMD), Arm (arm32/aarch64) and RISC-V. The counters will allow to generate interesting stats like cycles/byte, a measure that is helpful to compare different CPU micro architectures with the benefit of being independent of clock speed. Plus, any I/O operations (i.e. reading files from the disk) that will waste cycles displayed in a regular 'perf stat' will *not* be counted, since we only capture counters during the main benchmark loop. This patch is still in its early stages as the idea is to listen to feedback and properly address its current shortcommings to progress towards a contribution that can be landed on zstd.

Adenilson · 2025-03-29T23:35:05Z

Runnning with the help flag should print this:
adenilson@aquario:~/compression/my-fork-zstd$ ./programs/zstd --help
*** Zstandard CLI (64-bit) v1.5.8, by Yann Collet ***

Compress or decompress the INPUT file(s); reads from STDIN if INPUT is - or not provided.

Usage: zstd [OPTIONS...] [INPUT... | -] [-o OUTPUT]
...
Benchmark options:
-b# Perform benchmarking with compression level #. [Default: 3]
-e# Test all compression levels up to #; starting level is -b#. [Default: 1]
-i# Set the minimum evaluation to time # seconds. [Default: 3]
-y# Collect CPU counters.

Adenilson · 2025-03-29T23:36:49Z

Two examples when the flag is enabled:

a) Synthetic:
adenilson@aquario:~/compression/my-fork-zstd$ ./programs/zstd -b1y

Perf cycles: 326893971910 -> 3239077 (x3.087), 487.4 MB/s, 2636.7 MB/s

1#

b) With file input:
adenilson@aquario:~/compression/my-fork-zstd$ ./programs/zstd -b1y ~/corpus/linux-5.6-rc3.tar

Perf cycles: 427627890230 -> 190860020 (x5.017), 851.1 MB/s, 2906.7 MB/s

1#

Adenilson · 2025-03-29T23:42:09Z

The basic idea is to add into the benchmark mode a way to know more precisely the CPU stats operations (e.g. compression vs decompression), remove from the equation cycles spent on I/O and allow to calculate some extra stats (e.g. cycles/byte).

Adenilson · 2025-03-29T23:44:02Z

If this is a feature that could be helpful to zstd, I can further develop the patch to get into a "land-able" state.

This is just an early draft with the basic idea, a PoC (Proof of Concept).

Adenilson · 2025-03-29T23:45:58Z

I considered using the RDPMC instruction, but its behavior is different between x86-64 implementations (i.e. Intel vs AMD), plus it would be x86-64 only.

On the other hand, it may be possible to collect some extra counters not available using the Linux perf API.

@Cyan4973 thoughts?

Cyan4973 · 2025-03-30T01:23:32Z

I believe this is a good topic.
Benchmark mode is indeed useful to measure performance differences,
and adding counters to this stage is contributing to this objective.
I would just note that current -b already removes I/O operations, so it's purely a buffer-to-buffer operation.
There are also many kind of counters that could be collected, so I guess implementation still has a lot of choices to make.
Given it's an advanced feature, not enabled by default, I'm fine with non-portable counters that only exist on some platforms but not others.

facebook-github-bot added the CLA Signed label Mar 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[zstd][cli] Add performance counters support to bench mode #4354

[zstd][cli] Add performance counters support to bench mode #4354

Uh oh!

Adenilson commented Mar 29, 2025

Uh oh!

Adenilson commented Mar 29, 2025

Uh oh!

Adenilson commented Mar 29, 2025

Uh oh!

Adenilson commented Mar 29, 2025

Uh oh!

Adenilson commented Mar 29, 2025 •

edited

Loading

Uh oh!

Adenilson commented Mar 29, 2025 •

edited

Loading

Uh oh!

Cyan4973 commented Mar 30, 2025

Uh oh!

Uh oh!

[zstd][cli] Add performance counters support to bench mode #4354

Are you sure you want to change the base?

[zstd][cli] Add performance counters support to bench mode #4354

Uh oh!

Conversation

Adenilson commented Mar 29, 2025

Uh oh!

Adenilson commented Mar 29, 2025

Uh oh!

Adenilson commented Mar 29, 2025

Perf cycles: 326893971910 -> 3239077 (x3.087), 487.4 MB/s, 2636.7 MB/s

Perf cycles: 427627890230 -> 190860020 (x5.017), 851.1 MB/s, 2906.7 MB/s

Uh oh!

Adenilson commented Mar 29, 2025

Uh oh!

Adenilson commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adenilson commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyan4973 commented Mar 30, 2025

Uh oh!

Uh oh!

Adenilson commented Mar 29, 2025 •

edited

Loading

Adenilson commented Mar 29, 2025 •

edited

Loading