Skip to content

Reduce threading overhead #2740

Open
Open
@kripken

Description

@kripken

Our threading overhead seems significant. When I measure a fixed pure computational workload, replacing the body of a pass like precompute to instead just do some silly work, then measuring with time, the user time is the same when BINARYEN_CORES=1 (use 1 core) and when running normally with all cores. That makes sense since the total actual work is added up in user, and it's the same. And there isn't much synchronization overhead that slows us down.

But that's not the typical case when running real passes, the user for multi-core can be much higher, see e.g. #2733 (comment) and I see similar things locally with user being 2-3 larger when using 8 threads.

This may be a large speedup opportunity. One possibility is that we often have many tiny functions, and maybe switching between them is costly? Or maybe there is contention on locks (see that last link, but this happens even after that PR which should get rid of that).

The thread-pool using code for running passes on functions is here:

// non-debug normal mode, run them in an optimal manner - for locality it is

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions