Rewrite inlining pass #1935

vouillon · 2025-04-14T17:18:09Z

No description provided.

compiler/lib/inline.ml

compiler/lib/code.ml

compiler/lib/specialize.ml

compiler/lib/inline.ml

hhugo · 2025-04-25T09:06:58Z

I've pushed a fixup to the testsuite.
We should check how this PR affects functor heavy programs (something using core maybe).
@TyOverby could you tests this PR on your side ?

hhugo · 2025-04-25T09:07:09Z

We need a changelog entry

hhugo · 2025-04-25T10:49:43Z

I'm not certain I read the benchmark correctly.
It seems that partial render table sees a code size increase of 10%, memory increase of ~50%, compilation time increase of 30% for not runtime improvement.

hhugo · 2025-05-06T10:38:17Z

Maybe we can wait for #1962 to get better measurements.

hhugo · 2025-05-07T08:05:00Z

We don't have the latest benchmark.
The last result we have shows a runtime regression for ocamlc (maybe some noise ?).

With this PR, we seem to double the time spent in inline. We can probably live with that.

hhugo · 2025-05-16T14:56:22Z

fannkuch_redux and fft seem to take longer now. Can you take a look ? Compilation time increase everywhere but I guess we could live with that given recent improvement everywhere else

vouillon · 2025-05-16T17:55:44Z

For fft, it's because a function no longer gets inline because I have reduced the inlining limit from 200 down to 150.
For fankuch_redux, the function fannkuch is no longer inlined at toplevel, so it not optimized with the assumption that n = 10.

  let n = 10 in
  let _maxflips, _checksum = fannkuch n in

Inlining small functions make a significant difference for raytrace.

hhugo · 2025-05-16T18:20:38Z

For fft, it's because a function no longer gets inline because I have reduced the inlining limit from 200 down to 150. For fankuch_redux, the function fannkuch is no longer inlined at toplevel, so it not optimized with the assumption that n = 10.
  let n = 10 in
  let _maxflips, _checksum = fannkuch n in
Inlining small functions make a significant difference for raytrace.

Are you ok to merge in the current state ?

hhugo · 2025-05-16T18:24:43Z

Apologies for the delay; I didn't see this thread for a while. We should have some test and benchmark results ready for you next week.

@TyOverby, any update on this ?

TyOverby · 2025-05-16T18:35:37Z

We've been trying to import these changes (well, really the base revision so that we have a good point to compare benchmarks with) and have hit a very large number of conflicts with our internal patches due to the recent PRs that have been merged. I think we're close to being ready to test this PR, my guess is next week.

vouillon · 2025-05-16T18:44:26Z

Are you ok to merge in the current state ?

I would prefer to wait for some feedbacks from Ty.

rickyvetter · 2025-05-21T02:23:59Z

We were able to run pull this in internally and performance looks very good! Substantially faster and more consistent on PRT and our other internal benchmarks. For Bonsai benchmarks we are seeing 50%-80% reduction in benchmarking times. Binary size looks like <1% increase in separate compilation and 0-2% in whole program. There are a couple outlier programs that increase in the 10-16% range.

We've reached out about a miscompilation issue on Slack and initially we believe it's unrelated to this PR directly, but it looks like applying this patch actually causes a very similar miscompilation in a program that didn't have it before. This one new case is the only test we have failing, and I suspect that if we resolve the minimal repro for the original issue, then we might also see how to resolve this new instance in this PR?

TyOverby · 2025-05-21T03:21:55Z

For the bonsai benchmarks, I suspect that the large improvements are due to the inlining-related memory leak being resolved by this PR

hhugo · 2025-05-21T04:31:25Z

We were able to run pull this in internally and performance looks very good! Substantially faster and more consistent on PRT and our other internal benchmarks. For Bonsai benchmarks we are seeing 50%-80% reduction in benchmarking times. Binary size looks like <1% increase in separate compilation and 0-2% in whole program. There are a couple outlier programs that increase in the 10-16% range.

We've reached out about a miscompilation issue on Slack and initially we believe it's unrelated to this PR directly, but it looks like applying this patch actually causes a very similar miscompilation in a program that didn't have it before. This one new case is the only test we have failing, and I suspect that if we resolve the minimal repro for the original issue, then we might also see how to resolve this new instance in this PR?

Anything to say on compilation times?

- We are a lot more aggressive at inlining functor-like functions in wasm_of_ocaml, since this may enable further optimizations - We are more cautious at inlining nested functions, since this can result in memory leaks - We inline a larger class of small functions

hhugo · 2025-05-21T07:38:16Z

Let's merge and move from there.

CHANGES: ## Features/Changes * Misc: drop support for OCaml 4.12 and bellow * Misc: switch to dune.3.19 * Misc: initial support for ocaml 5.4 (ocsigen/js_of_ocaml#2030, ocsigen/js_of_ocaml#2058) * Compiler: support for OCaml 4.14.3+trunk (ocsigen/js_of_ocaml#1844) * Compiler: add the `--empty-sourcemap` flag * Compiler: improve debug/sourcemap location of closures (ocsigen/js_of_ocaml#1947) * Compiler: optimize compilation of switches (ocsigen/js_of_ocaml#1921, ocsigen/js_of_ocaml#2057) * Compiler: evaluate statically more primitives (ocsigen/js_of_ocaml#1912, ocsigen/js_of_ocaml#1915, ocsigen/js_of_ocaml#1965, ocsigen/js_of_ocaml#1969) * Compiler: rewrote inlining pass (ocsigen/js_of_ocaml#1935, ocsigen/js_of_ocaml#2018, ocsigen/js_of_ocaml#2027) * Compiler: improve tailcall optimization (ocsigen/js_of_ocaml#1943) * Compiler: improve deadcode optimization (ocsigen/js_of_ocaml#1963, ocsigen/js_of_ocaml#1962, ocsigen/js_of_ocaml#1967) * Compiler: deadcode elimination of cyclic values (ocsigen/js_of_ocaml#1978) * Compiler: remove empty blocks (ocsigen/js_of_ocaml#1934) * Compiler: improve coloring optimization (ocsigen/js_of_ocaml#1971, ocsigen/js_of_ocaml#1984, ocsigen/js_of_ocaml#1986, ocsigen/js_of_ocaml#1989) * Compiler: faster constant sharing (ocsigen/js_of_ocaml#1988) * Compiler: faster js code generation (ocsigen/js_of_ocaml#1985) * Compiler: improve performance of Javascript linking * Compiler: more efficient code generation from bytecode (ocsigen/js_of_ocaml#1972) * Compiler: faster compilation by improving the scheduling of optimization passes (ocsigen/js_of_ocaml#1962, ocsigen/js_of_ocaml#2001, ocsigen/js_of_ocaml#2012, ocsigen/js_of_ocaml#2027) * Compiler: faster compilation by stopping sooner when optimizations become unproductive (ocsigen/js_of_ocaml#1939) * Compiler: Propagate arity between compilation units (ocsigen/js_of_ocaml#1594) * Compiler: Add flags to enable/disable warnings (ocsigen/js_of_ocaml#2052) * Compiler/wasm: directly write Wasm binary modules (ocsigen/js_of_ocaml#2000, ocsigen/js_of_ocaml#2003) * Compiler/wasm: faster wat output (ocsigen/js_of_ocaml#1992) * Compiler/wasm: use a Wasm text files preprocessor (ocsigen/js_of_ocaml#1822) * Compiler/wasm: optimize integer operations (ocsigen/js_of_ocaml#2032) * Compiler/wasm: use type analysis to remove some unnecessary uses of JavasScript strict equality (ocsigen/js_of_ocaml#2040) * Compiler/wasm: use more precise environment types (ocsigen/js_of_ocaml#2041) * Compiler/wasm: optimize calls to statically known function (ocsigen/js_of_ocaml#2044) * Runtime: use es6 class (ocsigen/js_of_ocaml#1840) * Runtime: support more Unix functions (ocsigen/js_of_ocaml#1829) * Runtime: remove polyfill for Map to simplify MlObjectTable implementation (ocsigen/js_of_ocaml#1846) * Runtime: refactor caml_xmlhttprequest_create implementation (ocsigen/js_of_ocaml#1846) * Runtime: update constant imports to use `node:fs` module (ocsigen/js_of_ocaml#1850) * Runtime: make Obj.dup work with floats and boxed numbers (ocsigen/js_of_ocaml#1871) * Runtime: delete BigStringReader, one should use UInt8ArrayReader instead * Runtime: less conversion during un-marshalling (ocsigen/js_of_ocaml#1889) * Runtime: use TextEncoder/TextDecoder for utf8-utf16 conversions * Runtime: use Dataview to convert between floats and bit representation * Runtime: optimize Str.search_forward/search_backward (ocsigen/js_of_ocaml#2056) * Runtime: deprecate caml_ba_create_from (ocsigen/js_of_ocaml#2056) * Runtime: check for unused variable in the runtime (ocsigen/js_of_ocaml#2056) * Runtime/wasm: implement BLAKE2b primitives for Wasm (ocsigen/js_of_ocaml#1873) * Runtime/wasm: support jsoo_env and keep track of backtrace status (ocsigen/js_of_ocaml#1881) * Runtime/wasm: support unmarshaling compressed data (ocsigen/js_of_ocaml#1898) * Runtime/wasm: make resuming a continuation more efficient in Wasm (ocsigen/js_of_ocaml#1892) * Runtime/wasm: use imported string constants for JavaScript strings (ocsigen/js_of_ocaml#2022) * Runtime/wasm: use DataView primitives to implement bigarrays (ocsigen/js_of_ocaml#1979) * Ppx: explicitly disallow polymorphic method (ocsigen/js_of_ocaml#1897) * Ppx: allow "function" in object literals (ocsigen/js_of_ocaml#1897) * Lib: add Dom_html.window.matchMedia & Dom_html.mediaQueryList (ocsigen/js_of_ocaml#2017) * Lib: make the Wasm version of Json.output work with native ints and JavaScript objects (ocsigen/js_of_ocaml#1872) ## Bug fixes * Compiler: fix stack overflow issues with double translation (ocsigen/js_of_ocaml#1869) * Compiler: minifier fix (ocsigen/js_of_ocaml#1867) * Compiler: fix shortvar with --enable es6 (AssignTarget was not properly handled) * Compiler: fix assert failure with double translation (ocsigen/js_of_ocaml#1870) * Compiler: fix path rewriting of Wasm source maps (ocsigen/js_of_ocaml#1882) * Compiler: fix global dead code in presence of dead tailcall (ocsigen/js_of_ocaml#2010) * Compiler/wasm: fix bound check for empty float array (ocsigen/js_of_ocaml#1904) * Runtime: fix path normalization (ocsigen/js_of_ocaml#1848) * Runtime: fix reading from the pseudo-filesystem (ocsigen/js_of_ocaml#1859) * Runtime: fix initialization of standard streams under Windows (ocsigen/js_of_ocaml#1849) * Runtime: fix Int64.of_string overflow check (ocsigen/js_of_ocaml#1874) * Runtime: fix caml_string_concat when not using JS strings (ocsigen/js_of_ocaml#1874) * Runtime: consistent bigarray hashing across all architectures (ocsigen/js_of_ocaml#1977) * Runtime: fix caml_utf8_of_utf16 bug in high surrogate case (ocsigen/js_of_ocaml#2008) * Runtime: fix method lookup (ocsigen/js_of_ocaml#2034, ocsigen/js_of_ocaml#2038, ocsigen/js_of_ocaml#2039) * Lib: fix Dom_html.Keyboard_code.of_event (ocsigen/js_of_ocaml#1878) * Tools: fix jsoo_mktop and jsoo_mkcmis (ocsigen/js_of_ocaml#1877) * Toplevel: fix for when use-js-strings is disabled (ocsigen/js_of_ocaml#1997)

CHANGES: ## Features/Changes * Misc: drop support for OCaml 4.12 and bellow * Misc: switch to dune.3.19 * Misc: initial support for ocaml 5.4 (ocsigen/js_of_ocaml#2030, ocsigen/js_of_ocaml#2058) * Compiler: support for OCaml 4.14.3+trunk (ocsigen/js_of_ocaml#1844) * Compiler: add the `--empty-sourcemap` flag * Compiler: improve debug/sourcemap location of closures (ocsigen/js_of_ocaml#1947) * Compiler: optimize compilation of switches (ocsigen/js_of_ocaml#1921, ocsigen/js_of_ocaml#2057) * Compiler: evaluate statically more primitives (ocsigen/js_of_ocaml#1912, ocsigen/js_of_ocaml#1915, ocsigen/js_of_ocaml#1965, ocsigen/js_of_ocaml#1969) * Compiler: rewrote inlining pass (ocsigen/js_of_ocaml#1935, ocsigen/js_of_ocaml#2018, ocsigen/js_of_ocaml#2027) * Compiler: improve tailcall optimization (ocsigen/js_of_ocaml#1943) * Compiler: improve deadcode optimization (ocsigen/js_of_ocaml#1963, ocsigen/js_of_ocaml#1962, ocsigen/js_of_ocaml#1967) * Compiler: deadcode elimination of cyclic values (ocsigen/js_of_ocaml#1978) * Compiler: remove empty blocks (ocsigen/js_of_ocaml#1934) * Compiler: improve coloring optimization (ocsigen/js_of_ocaml#1971, ocsigen/js_of_ocaml#1984, ocsigen/js_of_ocaml#1986, ocsigen/js_of_ocaml#1989) * Compiler: faster constant sharing (ocsigen/js_of_ocaml#1988) * Compiler: faster js code generation (ocsigen/js_of_ocaml#1985, ocsigen/js_of_ocaml#2066) * Compiler: improve performance of Javascript linking * Compiler: more efficient code generation from bytecode (ocsigen/js_of_ocaml#1972) * Compiler: faster compilation by improving the scheduling of optimization passes (ocsigen/js_of_ocaml#1962, ocsigen/js_of_ocaml#2001, ocsigen/js_of_ocaml#2012, ocsigen/js_of_ocaml#2027) * Compiler: faster compilation by stopping sooner when optimizations become unproductive (ocsigen/js_of_ocaml#1939) * Compiler: Propagate arity between compilation units (ocsigen/js_of_ocaml#1594) * Compiler: Add flags to enable/disable warnings (ocsigen/js_of_ocaml#2052) * Compiler/wasm: directly write Wasm binary modules (ocsigen/js_of_ocaml#2000, ocsigen/js_of_ocaml#2003) * Compiler/wasm: faster wat output (ocsigen/js_of_ocaml#1992) * Compiler/wasm: use a Wasm text files preprocessor (ocsigen/js_of_ocaml#1822) * Compiler/wasm: optimize integer operations (ocsigen/js_of_ocaml#2032) * Compiler/wasm: use type analysis to remove some unnecessary uses of JavasScript strict equality (ocsigen/js_of_ocaml#2040) * Compiler/wasm: use more precise environment types (ocsigen/js_of_ocaml#2041) * Compiler/wasm: optimize calls to statically known function (ocsigen/js_of_ocaml#2044) * Runtime: use es6 class (ocsigen/js_of_ocaml#1840) * Runtime: support more Unix functions (ocsigen/js_of_ocaml#1829) * Runtime: remove polyfill for Map to simplify MlObjectTable implementation (ocsigen/js_of_ocaml#1846) * Runtime: refactor caml_xmlhttprequest_create implementation (ocsigen/js_of_ocaml#1846) * Runtime: update constant imports to use `node:fs` module (ocsigen/js_of_ocaml#1850) * Runtime: make Obj.dup work with floats and boxed numbers (ocsigen/js_of_ocaml#1871) * Runtime: delete BigStringReader, one should use UInt8ArrayReader instead * Runtime: less conversion during un-marshalling (ocsigen/js_of_ocaml#1889) * Runtime: use TextEncoder/TextDecoder for utf8-utf16 conversions * Runtime: use Dataview to convert between floats and bit representation * Runtime: optimize Str.search_forward/search_backward (ocsigen/js_of_ocaml#2056) * Runtime: deprecate caml_ba_create_from (ocsigen/js_of_ocaml#2056) * Runtime: check for unused variable in the runtime (ocsigen/js_of_ocaml#2056) * Runtime/wasm: implement BLAKE2b primitives for Wasm (ocsigen/js_of_ocaml#1873) * Runtime/wasm: support jsoo_env and keep track of backtrace status (ocsigen/js_of_ocaml#1881) * Runtime/wasm: support unmarshaling compressed data (ocsigen/js_of_ocaml#1898) * Runtime/wasm: make resuming a continuation more efficient in Wasm (ocsigen/js_of_ocaml#1892) * Runtime/wasm: use imported string constants for JavaScript strings (ocsigen/js_of_ocaml#2022) * Runtime/wasm: use DataView primitives to implement bigarrays (ocsigen/js_of_ocaml#1979) * Ppx: explicitly disallow polymorphic method (ocsigen/js_of_ocaml#1897) * Ppx: allow "function" in object literals (ocsigen/js_of_ocaml#1897) * Lib: add Dom_html.window.matchMedia & Dom_html.mediaQueryList (ocsigen/js_of_ocaml#2017) * Lib: make the Wasm version of Json.output work with native ints and JavaScript objects (ocsigen/js_of_ocaml#1872) ## Bug fixes * Compiler: fix stack overflow issues with double translation (ocsigen/js_of_ocaml#1869) * Compiler: minifier fix (ocsigen/js_of_ocaml#1867) * Compiler: fix shortvar with --enable es6 (AssignTarget was not properly handled) * Compiler: fix assert failure with double translation (ocsigen/js_of_ocaml#1870) * Compiler: fix path rewriting of Wasm source maps (ocsigen/js_of_ocaml#1882) * Compiler: fix global dead code in presence of dead tailcall (ocsigen/js_of_ocaml#2010) * Compiler/wasm: fix bound check for empty float array (ocsigen/js_of_ocaml#1904) * Runtime: fix path normalization (ocsigen/js_of_ocaml#1848) * Runtime: fix reading from the pseudo-filesystem (ocsigen/js_of_ocaml#1859) * Runtime: fix initialization of standard streams under Windows (ocsigen/js_of_ocaml#1849) * Runtime: fix Int64.of_string overflow check (ocsigen/js_of_ocaml#1874) * Runtime: fix caml_string_concat when not using JS strings (ocsigen/js_of_ocaml#1874) * Runtime: consistent bigarray hashing across all architectures (ocsigen/js_of_ocaml#1977) * Runtime: fix caml_utf8_of_utf16 bug in high surrogate case (ocsigen/js_of_ocaml#2008) * Runtime: fix method lookup (ocsigen/js_of_ocaml#2034, ocsigen/js_of_ocaml#2038, ocsigen/js_of_ocaml#2039) * Lib: fix Dom_html.Keyboard_code.of_event (ocsigen/js_of_ocaml#1878) * Tools: fix jsoo_mktop and jsoo_mkcmis (ocsigen/js_of_ocaml#1877) * Toplevel: fix for when use-js-strings is disabled (ocsigen/js_of_ocaml#1997)

vouillon force-pushed the inlining branch 3 times, most recently from 840420d to 7b64a79 Compare April 14, 2025 23:08

hhugo reviewed Apr 15, 2025

View reviewed changes

compiler/lib/inline.ml Outdated Show resolved Hide resolved

vouillon force-pushed the inlining branch 3 times, most recently from 79446f9 to ba1a622 Compare April 16, 2025 15:48

hhugo reviewed Apr 16, 2025

View reviewed changes

compiler/lib/code.ml Outdated Show resolved Hide resolved

vouillon force-pushed the inlining branch from 9121868 to 4e9b884 Compare April 17, 2025 14:37

hhugo reviewed Apr 17, 2025

View reviewed changes

compiler/lib/specialize.ml Outdated Show resolved Hide resolved

vouillon force-pushed the inlining branch 4 times, most recently from b62b39e to 5cb6652 Compare April 24, 2025 17:49

vouillon marked this pull request as ready for review April 24, 2025 17:50

hhugo force-pushed the inlining branch from 5cb6652 to 6b98b01 Compare April 25, 2025 07:32

hhugo reviewed Apr 25, 2025

View reviewed changes

compiler/lib/inline.ml Show resolved Hide resolved

hhugo reviewed Apr 25, 2025

View reviewed changes

compiler/lib/inline.ml Outdated Show resolved Hide resolved

hhugo force-pushed the inlining branch from 6b98b01 to 60af8f7 Compare April 25, 2025 08:15

hhugo force-pushed the inlining branch from 9716b41 to 25b2187 Compare May 5, 2025 19:55

vouillon force-pushed the inlining branch from 25b2187 to 8b20afb Compare May 6, 2025 13:43

hhugo force-pushed the inlining branch from fe72758 to 54488b5 Compare May 6, 2025 19:43

vouillon force-pushed the inlining branch from 54488b5 to 8a5ed87 Compare May 6, 2025 22:14

hhugo force-pushed the inlining branch 2 times, most recently from 10a1ba8 to 6aaf9ad Compare May 7, 2025 21:59

hhugo force-pushed the inlining branch 2 times, most recently from 13f7293 to 4ac6713 Compare May 16, 2025 10:58

hhugo force-pushed the inlining branch from 4ac6713 to 5696b99 Compare May 17, 2025 21:27

vouillon and others added 7 commits May 21, 2025 09:34

Rewrite inline pass

fc0c8b9

- We are a lot more aggressive at inlining functor-like functions in wasm_of_ocaml, since this may enable further optimizations - We are more cautious at inlining nested functions, since this can result in memory leaks - We inline a larger class of small functions

Update tests

a291eda

Less aggressive functor inlining for JavaScript code

8c751f6

Fix stream test

39f18b6

Add / update comments

00fff45

Changes

6ec8c30

Compiler: jsoo only inline (small) functor in O3

386765a

hhugo force-pushed the inlining branch from 5696b99 to 386765a Compare May 21, 2025 07:35

hhugo merged commit 3695d26 into master May 21, 2025
25 of 26 checks passed

hhugo deleted the inlining branch May 21, 2025 07:37

hhugo mentioned this pull request May 21, 2025

Do more aggressive lambda lifting #1886

Draft

hhugo mentioned this pull request Jun 27, 2025

[new release] js_of_ocaml (8 packages) (6.1.0+beta1) ocaml/opam-repository#28103

Closed

hhugo mentioned this pull request Jul 4, 2025

[new release] js_of_ocaml (8 packages) (6.1.0+beta2) ocaml/opam-repository#28130

Closed

hhugo mentioned this pull request Jul 4, 2025

[new release] js_of_ocaml (8 packages) (6.1.0) ocaml/opam-repository#28135

Merged

Rewrite inlining pass #1935

Rewrite inlining pass #1935

Uh oh!

Conversation

vouillon commented Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hhugo commented Apr 25, 2025

Uh oh!

hhugo commented Apr 25, 2025

Uh oh!

hhugo commented Apr 25, 2025

Uh oh!

hhugo commented May 6, 2025

Uh oh!

hhugo commented May 7, 2025

Uh oh!

hhugo commented May 16, 2025

Uh oh!

vouillon commented May 16, 2025

Uh oh!

hhugo commented May 16, 2025

Uh oh!

hhugo commented May 16, 2025

Uh oh!

TyOverby commented May 16, 2025

Uh oh!

vouillon commented May 16, 2025

Uh oh!

rickyvetter commented May 21, 2025

Uh oh!

TyOverby commented May 21, 2025

Uh oh!

hhugo commented May 21, 2025

Uh oh!

Uh oh!

hhugo commented May 21, 2025

Uh oh!

Uh oh!