-
Notifications
You must be signed in to change notification settings - Fork 194
[PERF] Dominance frontier calculation appears to be O(n^2) #1300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Dominance frontier calculation appears to be O(n^2) #1300
Comments
I suspect that there’s something super linear going on here because a from-scratch separate-compilation build of the same project spends ~20x less time doing dominance frontier calculation |
Can you share some code reproducing the issue ? What's the size of the bytecode file ? Have you tried |
Sadly I can not
112MB
I have not, though I can try it out and see what happens. I should point out that it's using direct-compilation, but doesn't use any of the other "optimization" flags. |
From debugging this a few weeks ago, I remember that the primary issue was that a region of the dominator tree is effectively just a singly-linked list (a node in the dominator tree has a single child, which has a single child, which has a single child, etc...) that it spends most of its time traversing. Is it possible that this is the list of constants/globals or maybe the toplevel statements? |
It appears to make no difference |
Because of this, I think the dominator tree depth was somewhere on the order of ~100,000 |
THE PLOT THICKENSPassing With
|
The same is true for jsoo link with --sourcemap: 5.1s |
This one doesn't surprise me as much; most of what I want jsoo-link to do is stitch-up sourcemaps correctly, but I wasn't expecting a full order of magnitude |
#1313 should reduce this difference.
It would be interesting to get numbers about the size of the debug info. |
passing --sourcemap forces the debug info to be read and used. You can achieve the same thing with Could you report timings when using I suspect that js_of_ocaml/compiler/lib/parse_bytecode.ml Line 400 in 9ea6852
--debug-info and --source-map , but not --pretty ). It creates lots of "artificial" blocks for the purpose of holding more debug-info.The change was Introduced in 08124c05888dc6cd33d4fa261dcd9803e6864b77https://github.com/ocsigen/js_of_ocaml/commit/08124c05888dc6cd33d4fa261dcd9803e6864b77. js_of_ocaml/compiler/lib/generate.ml Line 1359 in 9ea6852
If my understanding is correct, we should be able to disconnect how we track debug infos from how we split blocks. Maybe @vouillon has more to say about this. |
I'll try it out |
For what it's worth, I think that my original diagnosis that dominance frontier calculation is quadratic is still correct. This recent development with debuginfo shows that every pass gets ~2x slower, but codegen gets slower quadratically with respect to code size. |
Here is a minimal repro case. 500 consecutive function calls
5_000 consecutive function calls
10_000 consecutive function calls
|
…s_of_ocaml-ppx_deriving_json, js_of_ocaml-ppx, js_of_ocaml-lwt and js_of_ocaml-compiler (4.1.0) CHANGES: ## Features/Changes * Compiler: initial support for OCaml 5 (ocsigen/js_of_ocaml#1265,ocsigen/js_of_ocaml#1303) * Compiler: bump magic number to match the 5.0.0~alpha0 release (ocsigen/js_of_ocaml#1288) * Compiler: complain when runtime and compiler built-in primitives disagree (ocsigen/js_of_ocaml#1312) * Compiler: more efficient implementation of Js_traverse.freevar * Compiler: more efficient implementation of Js_traverse.rename_variable * Compiler: --linkall now export all compilation units in addition to primitives (ocsigen/js_of_ocaml#1324) * Compiler: improve --dynlink, one no longer need to pass --toplevel to use Dynlink (ocsigen/js_of_ocaml#1324) * Compiler: toplevel runtime files "+toplevel.js" and "+dynlink.js" are added automatically (ocsigen/js_of_ocaml#1324) * Misc: switch to cmdliner.1.1.0 * Misc: remove old binaries jsoo_link, jsoo_fs * Misc: remove uchar dep * Misc: use 4.14 in the CI * Misc: switch to dune 3 * Lib: add missing options for Intl.DateTimeFormat * Lib: add missing options for Intl.NumberFormat * Lib: wheel event binding * Lib: add normalize in js_string (ES6) * Lib: more complete transition event bindings * Lib: remove support for old browser-specific transition events * Runtime: Implement weak semantic for weak and ephemeron * Runtime: Implement Gc.finalise_last * Runtime: Implement buffer for in_channels * Runtime: add support for unix_opendir, unix_readdir, unix_closedir, win_findfirst, win_findnext, win_findclose * Runtime: Dont use require when target-env is browser * Runtime: Implements Parsing.set_trace (ocsigen/js_of_ocaml#1308) * Test: track external used in the stdlib and unix ## Bug fixes * Compiler: fix quadratic behavior of dominance frontier (fix ocsigen/js_of_ocaml#1300) * Compiler: fix rewriter bug in share_constant (fix ocsigen/js_of_ocaml#1247) * Compiler: fix miscompilation of mutually recursive functions in loop (ocsigen/js_of_ocaml#1321) * Compiler: fix bug while minifying/renaming try-catch blocks * Compiler: no dead code elimination for caml_js_get * Runtime: fix ocamlyacc parse engine (ocsigen/js_of_ocaml#1307) * Runtime: fix Out_channel.is_buffered, set_buffered * Runtime: fix format wrt alternative * Runtime: fix Digest.channel * Runtime: sync channel seek / pos with the native runtime * Misc: fix installation with dune 3 without opam * Node: Only write small chunks to stdout/stderr so they flush * Deriving: fix for nested polymorphic variants
A large jsoo application I've encountered at work spends about 9 minutes compiling with js_of_ocaml using direct-compilation (separate compilation is much faster). Profiling the compiler shows that almost all of this time is spent in generate.ml, specifically computing dominance frontiers.
I believe that the issue is that although dominance frontier calculation can be done for the whole program in one pass, the implementation in generate.ml will compute the frontier for each node individually, resulting in no opportunity to share results.
The text was updated successfully, but these errors were encountered: