Skip to content

[benchmark] Fix benchmarks for Set operations #18928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 24, 2018

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Aug 23, 2018

  • Don’t use a random number generator. (Hashing already scrambles elements.)
  • Add benchmarks for Set.subtracting.
  • Add new benchmarks for 0%, 25%, 50% and 100% overlap between sets. (The latter two only for Set<Int>.)
  • Also keep original benchmarks where feasible. (These only tested disjunct sets, so they were relatively useless.)

- Don’t use a random number generator
- Ensure there is a 25% overlap between sets. Original benchmarks tested non-overlapping sets only.
- Change names; results aren’t directly comparable.
@lorentey
Copy link
Member Author

@swift-ci please smoke test

@lorentey
Copy link
Member Author

@swift-ci please benchmark

@lorentey lorentey requested a review from airspeedswift August 23, 2018 18:54
Copy link
Member

@airspeedswift airspeedswift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

BenchmarkInfo(name: "SetIsSubsetOf_OfObjects", runFunction: run_SetIsSubsetOf_OfObjects, tags: [.validation, .api, .Set]),
BenchmarkInfo(name: "SetUnion", runFunction: run_SetUnion, tags: [.validation, .api, .Set]),
BenchmarkInfo(name: "SetUnion_OfObjects", runFunction: run_SetUnion_OfObjects, tags: [.validation, .api, .Set]),
BenchmarkInfo(name: "SetExclusiveOr2", runFunction: run_SetExclusiveOr2, tags: [.validation, .api, .Set]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 so we don't track this as a change in performance of this benchmark? @eeckstein is this important or should we not bother?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important. Renaming avoids showing this changes as false compiler improvements/regressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but there is no need to rename the actual run functions, it is enough to change the name in BenchmarkInfo.

That function rename was the source of the build error.

@palimondo
Copy link
Contributor

palimondo commented Aug 23, 2018

Since we are already renaming these, can we also lower the base workloads on these benchmarks so they run under 1ms (1000 microseconds)?

... and take care of the weird naming and setup overhead?

You can run Benchmark_Driver check -f Set.* and get a validation report.

I got this on my 2008 MBP:

WARNING runtime: 'SetExclusiveOr' execution took at least 20854 μs.
INFO runtime: Decrease the workload of 'SetExclusiveOr' by a factor of 16, to be less than 2500 μs.
ERROR naming: 'SetExclusiveOr_OfObjects' name doesn't conform to UpperCamelCase convention.
INFO naming: See http://bit.ly/UpperCamelCase
WARNING runtime: 'SetExclusiveOr_OfObjects' execution took at least 52514 μs.
INFO runtime: Decrease the workload of 'SetExclusiveOr_OfObjects' by a factor of 32, to be less than 2500 μs.
WARNING runtime: 'SetIntersect' execution took at least 4390 μs.
INFO runtime: Decrease the workload of 'SetIntersect' by a factor of 2, to be less than 2500 μs.
ERROR naming: 'SetIntersect_OfObjects' name doesn't conform to UpperCamelCase convention.
INFO naming: See http://bit.ly/UpperCamelCase
ERROR runtime: 'SetIntersect_OfObjects' has setup overhead of 860 μs (9.3%).
INFO runtime: Move initialization of benchmark data to the `setUpFunction` registered in `BenchmarkInfo`.
WARNING runtime: 'SetIntersect_OfObjects' execution took at least 8400 μs (excluding the setup overhead).
INFO runtime: Decrease the workload of 'SetIntersect_OfObjects' by a factor of 4, to be less than 2500 μs.
ERROR naming: 'SetIsSubsetOf_OfObjects' name doesn't conform to UpperCamelCase convention.
INFO naming: See http://bit.ly/UpperCamelCase
ERROR runtime: 'SetIsSubsetOf_OfObjects' has setup overhead of 422 μs (17.1%).
INFO runtime: Move initialization of benchmark data to the `setUpFunction` registered in `BenchmarkInfo`.
WARNING runtime: 'SetUnion' execution took at least 18467 μs.
INFO runtime: Decrease the workload of 'SetUnion' by a factor of 8, to be less than 2500 μs.
ERROR naming: 'SetUnion_OfObjects' name doesn't conform to UpperCamelCase convention.
INFO naming: See http://bit.ly/UpperCamelCase
WARNING runtime: 'SetUnion_OfObjects' execution took at least 44172 μs.
INFO runtime: Decrease the workload of 'SetUnion_OfObjects' by a factor of 32, to be less than 2500 μs.

CI machine is much faster, only SetExcusive* and SetUnion* might require lowering of base load there...

For good template on how to exclude setup overhead, see this commit by @eeckstein.

@swift-ci
Copy link
Contributor

Build comment file:

Build failed before running benchmark.


BenchmarkInfo(name: "SetIsSubsetOf_OfObjects", runFunction: run_SetIsSubsetOf_OfObjects, tags: [.validation, .api, .Set]),
BenchmarkInfo(name: "SetUnion", runFunction: run_SetUnion, tags: [.validation, .api, .Set]),
BenchmarkInfo(name: "SetUnion_OfObjects", runFunction: run_SetUnion_OfObjects, tags: [.validation, .api, .Set]),
BenchmarkInfo(name: "SetExclusiveOr2", runFunction: run_SetExclusiveOr2, tags: [.validation, .api, .Set]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but there is no need to rename the actual run functions, it is enough to change the name in BenchmarkInfo.

That function rename was the source of the build error.

The code isn’t the same, but comparisons with previous releases will still be useful (as long as timings match with the original code).
@lorentey
Copy link
Member Author

Welp, serves me right for not going the whole way.

The latest commit adds benchmarks with names and parameters matching the original benchmarks. (But without setup costs etc.) If they come out close to the originals, I think there's still value in keeping them, so that we have an easy way to compare releases.

(Swift 4.2 and 5 are both extensively changing the internals of hashing as well as Set and Dictionary itself; it makes sense not to break old benchmarks if we can help it -- even if they are of questionable practical value.)

@lorentey
Copy link
Member Author

I need to try the shiny new thing!

@swift-ci please smoke benchmark staging

@lorentey
Copy link
Member Author

@swift-ci please smoke test

let size = 400
let overlap = 100

let setAB = Set(0 ..< size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was drawing number lines like the dummy that I am… a little bit of spelling out would help me grok the correctness of this much sooner. What do you think?

let setAB = Set(0 ..< size)                            // 0...399
let setCD = Set(size ..< 2 * size)                     // 400...799
let setBC = Set(size - overlap ..< 2 * size - overlap) // 300...699
let setB = Set(size - overlap ..< size)                // 300...399

setUpFunction: { blackHole([setAB, setBC]) }),
BenchmarkInfo(
name: "SetExclusiveOr2_OfObjects",
runFunction: { n in run_SetExclusiveOr_OfObjects(setOAB, setOBC, countAC, 10 * n) },
Copy link
Contributor

@palimondo palimondo Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical that the same multiplier — 10 * n — for objects as for Ints woudn't overshoot the healthy runtime. Do you aim for under 1ms (1000 μs) runtimes? I see these are much lower then original, so it might just work out… Do you want to keep the multipliers same to show the cost of going from Ints to Boxes? That would be great if their difference isn't bigger than 100x… 🤨
Also, conventionally the variable has been called with capital N, IMO it would make sense to not break with tradition here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between Int and Box<Int> is relatively small; it's merely an extra indirection. It is important to remain able to keep track of the difference, though.

Capitalized names for function parameters is a silly tradition; I'm happy to be the barbarian who breaks it. ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh, beyond the indirection, Box also hashes a bit slower than Int since it's not forwarding the one-shot hashing implementation.)

tags: [.validation, .api, .Set],
setUpFunction: { blackHole([setAB, setBC]) }),
BenchmarkInfo(
name: "SetExclusiveOr_OfObjects",
Copy link
Contributor

@palimondo palimondo Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please drop the underscores from test names, at least from the new ones? I'd call them like SetExclusiveOrInt and SetExclusiveOrBoxed, if coming up with the name from scratch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, as long as we end up losing compatibility. But if some of the original names remain, I'd prefer to make the relationship between old/new tests obvious. (Naming consistency is far less important than utility in this case.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eeckstein cares a lot about continuity of perf-tracking… I might have a look at adding some lnt migration scripts for that if it remains a point of contention.

@lorentey
Copy link
Member Author

@swift-ci please smoke benchmark staging

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (Legendary Great ™)*

* pending rename of new benchmarks (old too, when continuity fails)

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA SPEEDUP
Improvement
SetIsSubsetOf 321 256 -20.2% 1.25x
SetIntersect_OfObjects 1926 1542 -19.9% 1.25x
SetIsSubsetOf_OfObjects 457 372 -18.6% 1.23x
Added
SetExclusiveOr2 355 360 357
SetExclusiveOr2_OfObjects 754 758 755
SetIntersect2 158 168 161
SetIntersect2_OfObjects 368 371 369
SetIsSubsetOf2 72 73 72
SetIsSubsetOf2_OfObjects 179 181 179
SetUnion2 242 247 243
SetUnion2_OfObjects 519 552 532

Performance: -Osize

TEST OLD NEW DELTA SPEEDUP
Improvement
SetIntersect_OfObjects 1875 1553 -17.2% 1.21x
SetIsSubsetOf_OfObjects 458 382 -16.6% 1.20x
Added
SetExclusiveOr2 379 382 380
SetExclusiveOr2_OfObjects 740 740 740
SetIntersect2 167 175 170
SetIntersect2_OfObjects 366 369 367
SetIsSubsetOf2 73 75 73
SetIsSubsetOf2_OfObjects 181 184 182
SetUnion2 266 271 268
SetUnion2_OfObjects 543 545 543
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

_ r: Bool,
_ n: Int) {
for _ in 0 ..< n {
let isSubset = a.isSubset(of: identity(b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing… could you document what's the purpose of wrapping b in identity here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a relatively common idiom in these benchmarks -- it's there to prevent the compiler from moving things out of the loop. a and b are constant through all iterations, and in theory, a sufficiently smart compiler could optimize some/all of it away. Adding an opaque function call with unknown effects (hopefully) defeats these optimizations.

(I don't think any optimizations would apply in these cases, but it's better to be safe than sorry.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That explanation makes perfect sense. How about putting it in the source comment? That way it’s easier to point people unfamiliar with the idiom to this file as an example of best practice (link from docs, when we get to it).

- Isolate legacy tests. Add new tests for the same operations, with updated iteration counts and names.
- Rename new tests to follow a consistent naming scheme.
- Add tests for Set.subtracting.
- Add tests on integer sets with 50% and 100% overlap. (isSubset, intersection, union, symmetricDifference, subtracting)
@lorentey
Copy link
Member Author

@swift-ci please smoke benchmark staging

@lorentey
Copy link
Member Author

@swift-ci please smoke test

@lorentey
Copy link
Member Author

lorentey commented Aug 24, 2018

I added a large number of extra benchmark cases.

Having a variety of non-disjunct cases helps to keep the code honest -- otherwise we could end up optimizing for particular inputs and neglect/harm the others.

E.g. we currently perform set subtractions by elementwise removal, which optimizes for the small-overlap case with uniquely held storage, but it isn't very good when faced with COW copies or large overlaps.

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST MIN MAX MEAN MAX_RSS
Added
SetIntersectionBox0 163 165 163
SetIntersectionBox25 367 369 368
SetIntersectionInt0 59 62 60
SetIntersectionInt100 465 468 466
SetIntersectionInt25 162 168 164
SetIntersectionInt50 263 272 266
SetIsSubsetBox0 356 356 356
SetIsSubsetBox25 181 192 184
SetIsSubsetInt0 240 241 240
SetIsSubsetInt100 299 303 301
SetIsSubsetInt25 73 75 74
SetIsSubsetInt50 143 144 143
SetSubtractingBox0 173 182 176
SetSubtractingBox25 363 366 364
SetSubtractingInt0 71 73 71
SetSubtractingInt100 239 253 245
SetSubtractingInt25 145 150 146
SetSubtractingInt50 183 188 185
SetSymmetricDifferenceBox0 1108 1111 1109
SetSymmetricDifferenceBox25 725 727 726
SetSymmetricDifferenceInt0 496 500 497
SetSymmetricDifferenceInt100 304 306 304
SetSymmetricDifferenceInt25 344 349 346
SetSymmetricDifferenceInt50 340 343 341
SetUnionBox0 957 961 958
SetUnionBox25 522 522 522
SetUnionInt0 429 445 436
SetUnionInt100 101 104 102
SetUnionInt25 249 255 251
SetUnionInt50 210 215 212
Removed
SetIntersect_OfObjects 1908 1960 1931
SetIsSubsetOf 327 328 327
SetIsSubsetOf_OfObjects 463 484 474

Performance: -Osize

TEST MIN MAX MEAN MAX_RSS
Added
SetIntersectionBox0 168 171 169
SetIntersectionBox25 369 372 370
SetIntersectionInt0 61 63 61
SetIntersectionInt100 479 486 481
SetIntersectionInt25 171 176 172
SetIntersectionInt50 273 281 276
SetIsSubsetBox0 370 370 370
SetIsSubsetBox25 182 183 182
SetIsSubsetInt0 258 260 258
SetIsSubsetInt100 296 299 297
SetIsSubsetInt25 75 75 75
SetIsSubsetInt50 148 150 149
SetSubtractingBox0 173 175 173
SetSubtractingBox25 370 374 372
SetSubtractingInt0 85 87 85
SetSubtractingInt100 268 274 270
SetSubtractingInt25 164 171 166
SetSubtractingInt50 205 208 206
SetSymmetricDifferenceBox0 1118 1121 1119
SetSymmetricDifferenceBox25 746 748 746
SetSymmetricDifferenceInt0 510 515 512
SetSymmetricDifferenceInt100 314 316 314
SetSymmetricDifferenceInt25 347 352 349
SetSymmetricDifferenceInt50 344 348 345
SetUnionBox0 973 975 973
SetUnionBox25 535 537 536
SetUnionInt0 447 451 448
SetUnionInt100 120 126 123
SetUnionInt25 251 256 252
SetUnionInt50 214 218 215
Removed
SetIntersect_OfObjects 1954 2013 1984
SetIsSubsetOf 339 340 339
SetIsSubsetOf_OfObjects 476 477 476
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

BenchmarkInfo(
name: "SetIsSubsetInt0",
runFunction: { n in run_SetIsSubsetInt(setAB, setCD, false, 5000 * n) },
tags: [.validation, .api, .Set],
Copy link
Contributor

@palimondo palimondo Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags are same for all test here. How about extracting this to constant and DRY the declarations a bit?

name: "Set…", tags: tags,

Oh, I forgot Swift isn’t Python and we can’t reorder named parameters. Never mind than… 🤦‍♂️

@lorentey lorentey merged commit a01938a into swiftlang:master Aug 24, 2018
@palimondo
Copy link
Contributor

Excellent work! So glad to see cleanup and rejuvenation of the old crusty tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants