Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Benchmarking #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gvanrossum opened this issue Mar 5, 2021 · 8 comments
Closed

Benchmarking #10

gvanrossum opened this issue Mar 5, 2021 · 8 comments

Comments

@gvanrossum
Copy link
Collaborator

gvanrossum commented Mar 5, 2021

I'd like to have a benchmark so we have something concrete to target.

There are many benchmarks in PyPerformance and it runs for a long time. Some of the benchmarks are ancient (from the FORTRAN days) and focus on numeric array operations. I'm not interested in those (the users who have numeric arrays are all using numpy or a tensor package).

I like benchmarks that represent a more OO style of coding. (Note that even the "float" benchmark, which is supposed to measure some float operations including sqrt() and sin()/cos() was sped up by an improvement to the LOAD_ATTR opcode to speed up slots. :-) In PyPerformance there is a group that represent "apps" that we could use, or we could pick one of these.

There are also some benchmarks that the Pyston v2 project created: https://github.com/pyston/python-macrobenchmarks/ -- these would be interesting to try since they've got a somewhat similar goal as we do (keep the C/API unchanged) and they're farther along (claiming to be 20% faster) but they're closed source (for now).

For me, an important requirement is that a benchmark runs fairly quickly. If I have a benchmark that runs for a minute I'd probably be running it a lot to validate various tweaks I am experimenting with, even if I knew that the results were pretty noisy. OTOH if I only had a benchmark that ran for 15 minutes I'd probably run it only once or twice a day. If it ran for an hour I'd probably only run it overnight. We should probably run all of PyPerformance occasionally since it is used by the core dev team to validate whether a proposed speedup (a) does anything good for at least some of the benchmarks, and (b) doesn't slow anything down.

@ericsnowcurrently
Copy link
Collaborator

ericsnowcurrently commented Mar 5, 2021

(I posted the following to #11 before I saw this. 🙂)

Benchmarking is critical for weeding out less fruitful ideas early and other decision-making, as well as monitoring progress and communicating the merits of our work. So we will be running benchmarks frequently and want the workflow to be as low-overhead as possible.

Relevant topics

  • subsets of a benchmark suite
    • focus on only a subset of the benchmark suite
    • frequent runs on a subset (for speed), periodic runs on full suite
  • adding to pyperformance
    • should we add more benchmarks to the suite (e.g. datascience-oriented)
    • borrow from pyston 2 benchmarks?
  • PGO/LTO
    • per Greg Smith, don't worry about PGO/LTO until ready for upstream PR
  • tooling
    • use pyperformance
    • (maybe) a reporting site (a la speed.python.org)
    • (maybe) a job queue for benchmark runs on a central benchmarking server, with a CLI client to make requests, get results, etc.
  • hardware
    • we're working on getting some dedicated hardware

Profiling is a different question.

@gvanrossum
Copy link
Collaborator Author

Good thoughts!

  • Last we talked you had an issue running pyperformance in the "disassembler" branch -- did you solve this yet?
  • Once we have a specific benchmark that we want to improve we could profile (or otherwise instrument) CPython as it's running that benchmark, and use the profile data to direct our efforts.

@markshannon
Copy link
Member

Another benchmark suite to consider is the pyston suite: https://github.com/pyston/python-macrobenchmarks/

@ericsnowcurrently
Copy link
Collaborator

The impact of many of the optimizations we are pursuing (especially in the eval loop) is tied to various specific workloads, sometimes significantly. So it is important that we choose our target workloads conscientiously, and even document the rationale for the choices. In some cases it will also require that we add to our benchmark suite.

That said, I do not think we need to focus much at first on the best target workloads, other than to let the idea simmer. We'll be fine for the moment with just the available suites, microbenchmarks and all. I'm sure that it won't take long before we build a stronger intuition for targeting specific workloads with our optimizations, at which point we can apply increasing discipline to our selection (of both benchmarks and optimization ideas). An iterative process like that will allow us to ramp up our effectiveness on this project.

@ericsnowcurrently
Copy link
Collaborator

FWIW, @zooba pointed me at https://github.com/Azure/azure-sdk-for-python/blob/master/doc/dev/perfstress_tests.md. This is a tool and framework for stress-testing the azure SDK. It isn't something we would use but does offer some insight into a different sort of benchmarking. There may be a lesson or two in there for us, if we don't have other things to look into. 🙂

@gvanrossum
Copy link
Collaborator Author

Ooh, cool. Maybe we could contact the author and ask them what they have learned.

@markshannon
Copy link
Member

Emery Berger has done some work on randomized benchmarking to remove a lot of systematic errors.
https://emeryberger.com/
https://emeryberger.com/research/stabilizer/

@ericsnowcurrently
Copy link
Collaborator

An interesting article on getting reliable benchmark results from a CI system (e.g. GitHub Actions): https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/

@faster-cpython faster-cpython locked and limited conversation to collaborators Dec 2, 2021
@gramster gramster moved this to Todo in Fancy CPython Board Jan 10, 2022
@gramster gramster moved this from Todo to Other in Fancy CPython Board Jan 10, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants