-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Performance regression in "Add externfn macro and correctly label fixed_stack_segments" #8782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Great bug report. |
Updated on 8/29: I re-ran all my tests on the named revisions above, and also against current master. I'm not sure why, but current master is much slower than before. I put all my code into my github repository in the "performance-regression" branch of https://github.com/DaGenix/rust-working. Due to recent changes in master, the original code no longer compiles on master, so, I've updated the code a tiny little bit from before. The tests for test.rc were run as: ../path/to/rustc --opt-level 3 --test test.rc >/dev/null 2>&1 && ./test --bench x8 and the tests for aesbench2.rs were run as: ../path/to/rustc --opt-level 3 --test aesbench2.rs >/dev/null 2>&1 && ./aesbench2 --bench x8 Same as before, test.rc and aesbench2.rs contain pretty much identical code, except that the code in test.rc is split across multiple files and with aesbench2.rs everything is in a single file. From run to run, test results would vary by up to around 0.5 MB/s, so, I wouldn't read too much in to tiny variations. I just tried to report the value that was pretty much the average. If anyone tries to run the benchmarks, I have them setup to report 100x the MB/s value, so, they will report something like 8000 MB/s which is really 80.00 MB/s.
|
cc me |
Closing, these commits have since been removed. |
Move only_used_in_recursion to nursery r? `@llogiq` I still think this is the right thing to do for this lint. See: rust-lang#8782 I want to get this merged before the sync on Thursday, because I want to backport this and the last fix rust-lang#8691 to beta. changelog: Move [`only_used_in_recursion`] to nursery
Rework `only_used_in_recursion` fixes rust-lang#8782 fixes rust-lang#8629 fixes rust-lang#8560 fixes rust-lang#8556 This is a complete rewrite of the lint. This loses some capabilities of the old implementation. Namely the ability to track through tuple and slice patterns, as well as the ability to trace through assignments. The two reported bugs are fixed with this. One was caused by using the name of the method rather than resolving to the `DefId` of the called method. The second was cause by using the existence of a cycle in the dependency graph to determine whether the parameter was used in recursion even though there were other ways to create a cycle in the graph. Implementation wise this switches from using a visitor to walking up the tree from every use of each parameter until it has been determined the parameter is used for something other than recursion. This is likely to perform better as it avoids walking the entire function a second time, and it is unlikely to walk up the HIR tree very much. Some cases would perform worse though. cc `@buttercrab` changelog: Scale back `only_used_in_recursion` to fix false positives changelog: Move `only_used_in_recursion` back to `complexity`
I've noticed a minor performance regression in 0479d94 or maybe 303f650. Unfortunately, 303f650 doesn't compile, so, I can't be quite sure. The revision before that, c178b52, is fine. I've also tested it against current master, 1ac7d80.
I don't have a small test case that demonstrates the issue. All I have is a large one. Check out the "performance-regression" branch of https://github.com/DaGenix/rust-working and then run the benchmark on the test.rc file - the relevant item is aes::bench::aes_bench_x8 (there is lots of messy code in there; I apologize for that, I use that repository mostly as a backup for my main computer).
All tests were run with --opt-level 3 on a Core i5 under 64-bit Ubuntu. Performance is likely drastically different on an architecture where LLVM can't use the SSE instructions, so, I don't know how well this reproduces outside of x86_64.
c178b52 (last fast revision):
0479d94 (first slow revision that compiles):
1ac7d80 (current master):
Here is where it gets interesting: the code being benchmarked is mostly just the contents of the aessafe.rs file. I took code from that file and everything it depends on and moved it into a single file at: https://gist.github.com/DaGenix/6348471. When I benchmark this file using master, I get the faster (83.71 MB/s) performance.
This certainly isn't the ideal test case for this issue, although I'm not sure how to chop down the code to get a better test case at this point.
The text was updated successfully, but these errors were encountered: