Skip to content

nipafx/benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmark Lab

Arbitrary Java performance benchmarks, implemented with JMH. You can run them with:

mvn package
java --enable-preview --add-modules=jdk.incubator.vector -jar target/benchmarks.jar ${BENCHMARK}

Replace ${BENCHMARK} with either a package or class name to run just the specified tests.

Benchmarks:

Stream

The benchmarks for the posts Stream Performance and Stream Performance - Your Ideas. Read them for details on the setup.

Code

To tweak the benchmarks, take a look into their superclass AbstractIterationBenchmark.

Results

The results are collected in this Google Spreadsheet.

Parallel Stream Vectorization

Benchmarks for Heinz Kabutz' tweet on computing factorials with a parallel stream, where parallelization increased performance by more than 1x per core.

Code

Results

Benchmark (number) (parallel) Score Units
factorial 200000 false 12267.158 ms/op
factorial 200000 true 166.953 ms/op

This was done on a Ryzen 9 3900X, which has 24 logical cores, so you would expect a speedup of at most 24x, not the measured ~70x.

Stream mapMulti

Benchmarks for the exploration of Stream::mapMulti.

Code

Results

| Benchmark | %0s | Size | Score Error Units | | -------------- | ----:| ---------:|---------------,-----------:| | flatMap_count | 0.01 | 10_000 | 90.743 ± 5.377 us/op | | flatMap_count | 0.01 | 100_000 | 932.125 ± 44.921 us/op | | flatMap_count | 0.01 | 1_000_000 | 9509.989 ± 140.587 us/op | | flatMap_count | 0.1 | 10_000 | 87.546 ± 1.867 us/op | | flatMap_count | 0.1 | 100_000 | 953.468 ± 13.979 us/op | | flatMap_count | 0.1 | 1_000_000 | 10392.573 ± 432.308 us/op | | flatMap_count | 0.5 | 10_000 | 101.504 ± 3.363 us/op | | flatMap_count | 0.5 | 100_000 | 1150.309 ± 17.065 us/op | | flatMap_count | 0.5 | 1_000_000 | 1561.187 ± 324.065 us/op | | flatMap_count | 0.8 | 10_000 | 76.758 ± 7.455 us/op | | flatMap_count | 0.8 | 100_000 | 875.666 ± 22.733 us/op | | flatMap_count | 0.8 | 1_000_000 | 8861.991 ± 207.893 us/op | | flatMap_sum | 0.01 | 10_000 | 113.009 ± 6.977 us/op | | flatMap_sum | 0.01 | 100_000 | 1158.694 ± 74.973 us/op | | flatMap_sum | 0.01 | 1_000_000 | 1622.151 ± 533.694 us/op | | flatMap_sum | 0.1 | 10_000 | 108.073 ± 1.227 us/op | | flatMap_sum | 0.1 | 100_000 | 1155.964 ± 54.148 us/op | | flatMap_sum | 0.1 | 1_000_000 | 1777.393 ± 453.216 us/op | | flatMap_sum | 0.5 | 10_000 | 113.230 ± 5.485 us/op | | flatMap_sum | 0.5 | 100_000 | 1284.879 ± 63.869 us/op | | flatMap_sum | 0.5 | 1_000_000 | 2906.395 ± 259.311 us/op | | flatMap_sum | 0.8 | 10_000 | 79.432 ± 5.875 us/op | | flatMap_sum | 0.8 | 100_000 | 936.235 ± 16.178 us/op | | flatMap_sum | 0.8 | 1_000_000 | 9300.164 ± 222.237 us/op | | mapMulti_count | 0.01 | 10_000 | 11.021 ± 3.722 us/op | | mapMulti_count | 0.01 | 100_000 | 147.080 ± 35.460 us/op | | mapMulti_count | 0.01 | 1_000_000 | 1936.868 ± 54.057 us/op | | mapMulti_count | 0.1 | 10_000 | 8.703 ± 1.680 us/op | | mapMulti_count | 0.1 | 100_000 | 148.165 ± 4.892 us/op | | mapMulti_count | 0.1 | 1_000_000 | 1876.445 ± 76.954 us/op | | mapMulti_count | 0.5 | 10_000 | 14.665 ± 6.894 us/op | | mapMulti_count | 0.5 | 100_000 | 374.459 ± 20.920 us/op | | mapMulti_count | 0.5 | 1_000_000 | 4005.131 ± 62.601 us/op | | mapMulti_count | 0.8 | 10_000 | 5.489 ± 0.437 us/op | | mapMulti_count | 0.8 | 100_000 | 186.849 ± 4.387 us/op | | mapMulti_count | 0.8 | 1_000_000 | 1842.542 ± 80.950 us/op | | mapMulti_sum | 0.01 | 10_000 | 18.161 ± 0.632 us/op | | mapMulti_sum | 0.01 | 100_000 | 183.985 ± 3.982 us/op | | mapMulti_sum | 0.01 | 1_000_000 | 2309.631 ± 60.015 us/op | | mapMulti_sum | 0.1 | 10_000 | 16.771 ± 0.379 us/op | | mapMulti_sum | 0.1 | 100_000 | 187.113 ± 9.438 us/op | | mapMulti_sum | 0.1 | 1_000_000 | 2333.962 ± 72.094 us/op | | mapMulti_sum | 0.5 | 10_000 | 9.876 ± 0.206 us/op | | mapMulti_sum | 0.5 | 100_000 | 393.450 ± 19.738 us/op | | mapMulti_sum | 0.5 | 1_000_000 | 4327.512 ± 394.266 us/op | | mapMulti_sum | 0.8 | 10_000 | 7.022 ± 0.160 us/op | | mapMulti_sum | 0.8 | 100_000 | 218.717 ± 24.815 us/op | | mapMulti_sum | 0.8 | 1_000_000 | 2024.602 ± 62.695 us/op |

Unmodifiable List

The benchmarks for the post Can instanceof make Unmodifiable Collections faster? on Opsian's blog, which follows up on a Twitter conversation and measures how wrapping lists into Collections::unmodifiableList impacts performance.

Code

To tweak the benchmarks, take a look into their superclass AbstractIterationBenchmark.

Results

Find all results in this Google Spreadsheet. Here's the section for ArrayList with 1 million elements, which is representative for the overall results:

Benchmark Depth Score Error Units
forEach 0 6499.524 ± 116.113 μs/op
forEach 1 6466.625 ± 79.003 μs/op
forEach 10 6572.141 ± 49.209 μs/op
forEach 100 6429.473 ± 35.505 μs/op
forEach 1000 6756.348 ± 103.719 μs/op
isEmpty 0 0.003 ± 0.001 μs/op
isEmpty 1 0.004 ± 0.001 μs/op
isEmpty 10 0.015 ± 0.001 μs/op
isEmpty 100 0.205 ± 0.004 μs/op
isEmpty 1000 2.420 ± 0.020 μs/op
iterator 0 5729.565 ± 429.593 μs/op
iterator 1 7343.901 ± 334.612 μs/op
iterator 10 35199.093 ± 285.387 μs/op
iterator 100 417297.905 ± 9552.693 μs/op
iterator 1000 4778937.048 ± 46199.758 μs/op
linearAccess 0 3774.128 ± 39.618 μs/op
linearAccess 1 4714.176 ± 22.733 μs/op
linearAccess 10 30394.111 ± 218.719 μs/op
linearAccess 100 417612.502 ± 4279.777 μs/op
linearAccess 1000 4661914.890 ± 30868.705 μs/op
randomAccess 0 7231.825 ± 587.417 μs/op
randomAccess 1 7406.953 ± 692.301 μs/op
randomAccess 10 26135.629 ± 688.995 μs/op
randomAccess 100 243328.907 ± 6699.941 μs/op
randomAccess 1000 2457828.925 ± 17852.538 μs/op

And here are the numbers for WrappingBenchmark:

Benchmark Depth Score Error Units
originalUnmodifiable 1 8.508 ± 0.770 ns/op
originalUnmodifiable 10 50.997 ± 2.696 ns/op
originalUnmodifiable 100 413.526 ± 74.531 ns/op
originalUnmodifiable 1_000 3405.815 ± 382.755 ns/op
originalUnmodifiable 10_000 32218.869 ± 3729.403 ns/op
withInstanceOfCheck 1 6.236 ± 0.514 ns/op
withInstanceOfCheck 10 15.241 ± 0.686 ns/op
withInstanceOfCheck 100 108.349 ± 1.621 ns/op
withInstanceOfCheck 1_000 964.785 ± 19.586 ns/op
withInstanceOfCheck 10_000 9575.831 ± 197.473 ns/op

ArrayList::removeAt

Given a large ArrayList and a bunch of indices - what's a good way to get a list without the elements at those indices?

Code

Results

Some explaining needs to go into these numbers but I'm not going to do that now (too late). 😛

Benchmark                       (arrayLength)  (removals)    Mode     Cnt       Score      Error  Units
RemoveBenchmark.baseline              10_000           0  sample  424380      63.505 ±    0.222  us/op
RemoveBenchmark.baseline              10_000          10  sample  435078      61.964 ±    0.216  us/op
RemoveBenchmark.baseline              10_000         100  sample  430968      62.539 ±    0.202  us/op
RemoveBenchmark.baseline              10_000       1_000  sample  432169      62.364 ±    0.298  us/op
RemoveBenchmark.baseline             100_000           0  sample   41510     649.561 ±    2.447  us/op
RemoveBenchmark.baseline             100_000          10  sample   42676     631.752 ±    2.078  us/op
RemoveBenchmark.baseline             100_000         100  sample   41851     644.353 ±    2.262  us/op
RemoveBenchmark.baseline             100_000       1_000  sample   41461     650.318 ±    2.550  us/op
RemoveBenchmark.baseline           1_000_000           0  sample    6250    4318.081 ±   55.947  us/op
RemoveBenchmark.baseline           1_000_000          10  sample    6253    4315.854 ±   48.320  us/op
RemoveBenchmark.baseline           1_000_000         100  sample    6306    4279.591 ±   47.078  us/op
RemoveBenchmark.baseline           1_000_000       1_000  sample    6289    4290.965 ±   52.761  us/op
RemoveBenchmark.copyArrayRemove       10_000           0  sample  422433      63.808 ±    0.212  us/op
RemoveBenchmark.copyArrayRemove       10_000          10  sample  363302      74.184 ±    0.235  us/op
RemoveBenchmark.copyArrayRemove       10_000         100  sample   79192      75.628 ±    0.343  us/op
RemoveBenchmark.copyArrayRemove      100_000           0  sample   41897     643.563 ±    2.697  us/op
RemoveBenchmark.copyArrayRemove      100_000          10  sample   36363     741.439 ±    3.704  us/op
RemoveBenchmark.copyArrayRemove      100_000         100  sample   24082     746.417 ±    3.579  us/op
RemoveBenchmark.copyArrayRemove    1_000_000           0  sample    5934    4548.275 ±   61.058  us/op
RemoveBenchmark.copyArrayRemove    1_000_000          10  sample    4058    6654.678 ±  102.694  us/op
RemoveBenchmark.copyArrayRemove    1_000_000         100  sample    4135    6529.373 ±   94.918  us/op
RemoveBenchmark.iterativeAt           10_000           0  sample  424924      63.429 ±    0.149  us/op
RemoveBenchmark.iterativeAt           10_000          10  sample  409336      65.847 ±    0.171  us/op
RemoveBenchmark.iterativeAt           10_000         100  sample  259490     103.919 ±    0.315  us/op
RemoveBenchmark.iterativeAt           10_000       1_000  sample   65966     408.920 ±    1.295  us/op
RemoveBenchmark.iterativeAt          100_000           0  sample   39867     676.439 ±    2.997  us/op
RemoveBenchmark.iterativeAt          100_000          10  sample   36813     732.621 ±    2.605  us/op
RemoveBenchmark.iterativeAt          100_000         100  sample   19385    1391.798 ±    4.587  us/op
RemoveBenchmark.iterativeAt          100_000       1_000  sample    3730    7242.558 ±   31.338  us/op
RemoveBenchmark.iterativeAt        1_000_000           0  sample    6145    4394.814 ±   55.932  us/op
RemoveBenchmark.iterativeAt        1_000_000          10  sample    4289    6297.320 ±   93.764  us/op
RemoveBenchmark.iterativeAt        1_000_000         100  sample    2089   12943.901 ±  123.726  us/op
RemoveBenchmark.iterativeAt        1_000_000       1_000  sample     315   86580.338 ± 2239.426  us/op
RemoveBenchmark.listIterator          10_000           0  sample  338713      79.565 ±    0.262  us/op
RemoveBenchmark.listIterator          10_000          10  sample  291777      92.398 ±    0.445  us/op
RemoveBenchmark.listIterator         100_000           0  sample   28322     951.927 ±   10.195  us/op
RemoveBenchmark.listIterator         100_000          10  sample   23533    1145.802 ±   11.786  us/op
RemoveBenchmark.listIterator       1_000_000           0  sample    2812    9677.106 ± 1049.759  us/op
RemoveBenchmark.listIterator       1_000_000          10  sample    2010   13512.276 ± 1252.236  us/op
RemoveBenchmark.listIterator       1_000_000         100  sample    1319   13733.767 ± 1505.709  us/op
RemoveBenchmark.listIterator       1_000_000       1_000  sample    1117   13468.484 ± 1633.372  us/op
RemoveBenchmark.nullifyRemove         10_000           0  sample  359501      74.971 ±    0.174  us/op
RemoveBenchmark.nullifyRemove         10_000          10  sample  215591     125.113 ±    0.355  us/op
RemoveBenchmark.nullifyRemove         10_000         100  sample  225774     119.462 ±    0.201  us/op
RemoveBenchmark.nullifyRemove         10_000       1_000  sample  228752     117.899 ±    0.396  us/op
RemoveBenchmark.nullifyRemove        100_000           0  sample   36508     738.993 ±    3.087  us/op
RemoveBenchmark.nullifyRemove        100_000          10  sample   20568    1311.620 ±    5.625  us/op
RemoveBenchmark.nullifyRemove        100_000         100  sample   20882    1291.816 ±    5.561  us/op
RemoveBenchmark.nullifyRemove        100_000       1_000  sample   21246    1269.576 ±    3.442  us/op
RemoveBenchmark.nullifyRemove      1_000_000           0  sample    5680    4751.827 ±   55.961  us/op
RemoveBenchmark.nullifyRemove      1_000_000          10  sample    2585   10454.266 ±  103.232  us/op
RemoveBenchmark.nullifyRemove      1_000_000         100  sample    2698   10016.958 ±  114.831  us/op
RemoveBenchmark.nullifyRemove      1_000_000       1_000  sample    2725    9911.803 ±   82.370  us/op

Helpful NPE Messages

In Java 14, JEP 358 introduced more detailed NullPointerException messages. They're deactiveated by default for various reasons, one of them performance:

The algorithm adds some overhead to the production of a stack trace. However, this is comparable to the stack walking done when raising the exception. If an application frequently throws and prints messages so that the printing affects performance, already throwing the exception imposes an overhead that definitely should be avoided.

How much overhead do the messages add?

Code

To compare numbers, run the benchmark with (-XX:+ShowCodeDetailsInExceptionMessages) and without (-XX:-ShowCodeDetailsInExceptionMessages) detailed messages.

Preliminary Results

With detailed message:

Benchmark                                (checkMessage)  Mode  Cnt        Score        Error  Units
NpeBenchmarks.throwNpeFromNullArrayEntry          false  avgt    3        3,924 ±      0,416  ns/op
NpeBenchmarks.throwNpeFromNullArrayEntry           true  avgt    3       26,551 ±      5,265  ns/op
NpeBenchmarks.throwNpeFromNullInstance            false  avgt    3        3,943 ±      0,764  ns/op
NpeBenchmarks.throwNpeFromNullInstance             true  avgt    3       28,233 ±     15,923  ns/op
NpeBenchmarks.throwNpeFromNullReturn              false  avgt    3        4,020 ±      0,269  ns/op
NpeBenchmarks.throwNpeFromNullReturn               true  avgt    3       27,173 ±      8,789  ns/op
NpeBenchmarks.throwNpeManually                    false  avgt    3      919,585 ±     92,235  ns/op
NpeBenchmarks.throwNpeManually                     true  avgt    3     1429,069 ±    103,055  ns/op

Without detailed message:

Benchmark                                (checkMessage)  Mode  Cnt        Score        Error  Units
NpeBenchmarks.throwNpeFromNullArrayEntry          false  avgt    3        3,907 ±      0,991  ns/op
NpeBenchmarks.throwNpeFromNullArrayEntry           true  avgt    3       21,764 ±      4,887  ns/op
NpeBenchmarks.throwNpeFromNullInstance            false  avgt    3        3,910 ±      0,434  ns/op
NpeBenchmarks.throwNpeFromNullInstance             true  avgt    3       21,098 ±      0,948  ns/op
NpeBenchmarks.throwNpeFromNullReturn              false  avgt    3        3,875 ±      0,192  ns/op
NpeBenchmarks.throwNpeFromNullReturn               true  avgt    3       21,593 ±      0,409  ns/op
NpeBenchmarks.throwNpeManually                    false  avgt    3      924,070 ±    194,407  ns/op
NpeBenchmarks.throwNpeManually                     true  avgt    3      941,316 ±    172,077  ns/op

Vector API

The vector API promises reliable vectorization of numerical and logical computations, particularly in loops. The comparison is between "regular" loops (that may be auto-vectorized by the JIT) and vectorized loops.

Code

Preliminary Results

Benchmark                            Mode  Cnt       Score       Error  Units
ImageColors.invertColors             avgt    9   52158.239 ± 19513.007  us/op
ImageColors.invertColors_vectorized  avgt    9   15036.490 ±  4313.103  us/op
ImageColors.purpleShift              avgt    9  217830.220 ± 93890.988  us/op
ImageColors.purpleShift_vectorized   avgt    9   24536.969 ±  9002.955  us/op
ImageColors.rotateColors             avgt    9   49739.629 ± 19339.702  us/op
ImageColors.rotateColors_vectorized  avgt    9   14960.650 ±  4031.095  us/op

This was taken on my laptop without shutting down other programs, so they're not very reliable.

About

Arbitrary Java performance benchmarks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages