-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Fix benchmarks for Set operations #18928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Don’t use a random number generator - Ensure there is a 25% overlap between sets. Original benchmarks tested non-overlapping sets only. - Change names; results aren’t directly comparable.
@swift-ci please smoke test |
@swift-ci please benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
BenchmarkInfo(name: "SetIsSubsetOf_OfObjects", runFunction: run_SetIsSubsetOf_OfObjects, tags: [.validation, .api, .Set]), | ||
BenchmarkInfo(name: "SetUnion", runFunction: run_SetUnion, tags: [.validation, .api, .Set]), | ||
BenchmarkInfo(name: "SetUnion_OfObjects", runFunction: run_SetUnion_OfObjects, tags: [.validation, .api, .Set]), | ||
BenchmarkInfo(name: "SetExclusiveOr2", runFunction: run_SetExclusiveOr2, tags: [.validation, .api, .Set]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2
so we don't track this as a change in performance of this benchmark? @eeckstein is this important or should we not bother?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important. Renaming avoids showing this changes as false compiler improvements/regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but there is no need to rename the actual run functions, it is enough to change the name in BenchmarkInfo
.
That function rename was the source of the build error.
Since we are already renaming these, can we also lower the base workloads on these benchmarks so they run under 1ms (1000 microseconds)? ... and take care of the weird naming and setup overhead? You can run I got this on my 2008 MBP:
CI machine is much faster, only For good template on how to exclude setup overhead, see this commit by @eeckstein. |
Build comment file:Build failed before running benchmark. |
BenchmarkInfo(name: "SetIsSubsetOf_OfObjects", runFunction: run_SetIsSubsetOf_OfObjects, tags: [.validation, .api, .Set]), | ||
BenchmarkInfo(name: "SetUnion", runFunction: run_SetUnion, tags: [.validation, .api, .Set]), | ||
BenchmarkInfo(name: "SetUnion_OfObjects", runFunction: run_SetUnion_OfObjects, tags: [.validation, .api, .Set]), | ||
BenchmarkInfo(name: "SetExclusiveOr2", runFunction: run_SetExclusiveOr2, tags: [.validation, .api, .Set]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but there is no need to rename the actual run functions, it is enough to change the name in BenchmarkInfo
.
That function rename was the source of the build error.
The code isn’t the same, but comparisons with previous releases will still be useful (as long as timings match with the original code).
Welp, serves me right for not going the whole way. The latest commit adds benchmarks with names and parameters matching the original benchmarks. (But without setup costs etc.) If they come out close to the originals, I think there's still value in keeping them, so that we have an easy way to compare releases. (Swift 4.2 and 5 are both extensively changing the internals of hashing as well as Set and Dictionary itself; it makes sense not to break old benchmarks if we can help it -- even if they are of questionable practical value.) |
I need to try the shiny new thing! @swift-ci please smoke benchmark staging |
@swift-ci please smoke test |
let size = 400 | ||
let overlap = 100 | ||
|
||
let setAB = Set(0 ..< size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was drawing number lines like the dummy that I am… a little bit of spelling out would help me grok the correctness of this much sooner. What do you think?
let setAB = Set(0 ..< size) // 0...399
let setCD = Set(size ..< 2 * size) // 400...799
let setBC = Set(size - overlap ..< 2 * size - overlap) // 300...699
let setB = Set(size - overlap ..< size) // 300...399
setUpFunction: { blackHole([setAB, setBC]) }), | ||
BenchmarkInfo( | ||
name: "SetExclusiveOr2_OfObjects", | ||
runFunction: { n in run_SetExclusiveOr_OfObjects(setOAB, setOBC, countAC, 10 * n) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical that the same multiplier — 10 * n
— for objects as for Int
s woudn't overshoot the healthy runtime. Do you aim for under 1ms (1000 μs) runtimes? I see these are much lower then original, so it might just work out… Do you want to keep the multipliers same to show the cost of going from Int
s to Box
es? That would be great if their difference isn't bigger than 100x… 🤨
Also, conventionally the variable has been called with capital N
, IMO it would make sense to not break with tradition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between Int
and Box<Int>
is relatively small; it's merely an extra indirection. It is important to remain able to keep track of the difference, though.
Capitalized names for function parameters is a silly tradition; I'm happy to be the barbarian who breaks it. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Oh, beyond the indirection, Box also hashes a bit slower than Int since it's not forwarding the one-shot hashing implementation.)
tags: [.validation, .api, .Set], | ||
setUpFunction: { blackHole([setAB, setBC]) }), | ||
BenchmarkInfo( | ||
name: "SetExclusiveOr_OfObjects", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please drop the underscores from test names, at least from the new ones? I'd call them like SetExclusiveOrInt
and SetExclusiveOrBoxed
, if coming up with the name from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, as long as we end up losing compatibility. But if some of the original names remain, I'd prefer to make the relationship between old/new tests obvious. (Naming consistency is far less important than utility in this case.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eeckstein cares a lot about continuity of perf-tracking… I might have a look at adding some lnt
migration scripts for that if it remains a point of contention.
@swift-ci please smoke benchmark staging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (Legendary Great ™)*
*
pending rename of new benchmarks (old too, when continuity fails)
Build comment file:Performance: -O
Performance: -Osize
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR. Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein). Hardware Overview
|
_ r: Bool, | ||
_ n: Int) { | ||
for _ in 0 ..< n { | ||
let isSubset = a.isSubset(of: identity(b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing… could you document what's the purpose of wrapping b
in identity
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a relatively common idiom in these benchmarks -- it's there to prevent the compiler from moving things out of the loop. a
and b
are constant through all iterations, and in theory, a sufficiently smart compiler could optimize some/all of it away. Adding an opaque function call with unknown effects (hopefully) defeats these optimizations.
(I don't think any optimizations would apply in these cases, but it's better to be safe than sorry.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That explanation makes perfect sense. How about putting it in the source comment? That way it’s easier to point people unfamiliar with the idiom to this file as an example of best practice (link from docs, when we get to it).
- Isolate legacy tests. Add new tests for the same operations, with updated iteration counts and names. - Rename new tests to follow a consistent naming scheme. - Add tests for Set.subtracting. - Add tests on integer sets with 50% and 100% overlap. (isSubset, intersection, union, symmetricDifference, subtracting)
@swift-ci please smoke benchmark staging |
@swift-ci please smoke test |
I added a large number of extra benchmark cases. Having a variety of non-disjunct cases helps to keep the code honest -- otherwise we could end up optimizing for particular inputs and neglect/harm the others. E.g. we currently perform set subtractions by elementwise removal, which optimizes for the small-overlap case with uniquely held storage, but it isn't very good when faced with COW copies or large overlaps. |
Build comment file:Performance: -O
Performance: -Osize
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR. Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein). Hardware Overview
|
BenchmarkInfo( | ||
name: "SetIsSubsetInt0", | ||
runFunction: { n in run_SetIsSubsetInt(setAB, setCD, false, 5000 * n) }, | ||
tags: [.validation, .api, .Set], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tags are same for all test here. How about extracting this to constant and DRY the declarations a bit?
name: "Set…", tags: tags,
Oh, I forgot Swift isn’t Python and we can’t reorder named parameters. Never mind than… 🤦♂️
Excellent work! So glad to see cleanup and rejuvenation of the old crusty tests! |
Set.subtracting
.Set<Int>
.)