-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Large memory usage and long time on compiling large number of println #86244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Another test, echo 'fn main() {' > hw.rs
for i in {0..16384}
do
echo ' dbg!("Hello, world!");' >> hw.rs
done
echo '}' >> hw.rs
/usr/bin/time -f "%Uuser %Ssystem %Eelapsed %PCPU (%Xtext+%Ddata %Mmax)k\n%Iinputs+%Ooutputs (%Fmajor+%Rminor)pagefaults %Wswaps\nUsed %MKbytes Max Resident MEM" rustc hw.rs
log:
|
I simplify the code to tons of Generator script: #!/usr/bin/env bash
exec >test.rs
echo 'fn main() {'
for i in $(seq 16384); do
echo '&"foo";'
done
echo '}' It costs 2.6GB memory and 50s to compile. While adding one more reference ( For comparation, |
@rustbot label I-compilemem I-compiletime T-compiler |
Profiling shows that 97% of the time is spent in mir_borrowck. |
Did some further analysis with
Majority of the time is spent in
I don't understand the code well enough to make any conclusions or make any changes, so perhaps someone more familiar with this code could give some hints. |
#50994 uses 5,000 |
The example by @oxalica shows that this issue has nothing to do with #50994 is related to LLVM performance issue while this one is related to borrow checking. Note that when that issue is posted, Rust does not yet have stable NLL, so it's certainly a different issue from this one. |
Both issues were opened with the premise of benchmarking the compiler with the same |
perf: Don't track specific live points for promoteds We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands. For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds). cc rust-lang#50994, rust-lang#86244
…dtwco perf: Don't track specific live points for promoteds We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands. For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds). cc rust-lang#50994, rust-lang#86244
perf: Don't track specific live points for promoteds We don't query this information out of the promoted (it's basically a single "unit" regardless of the complexity within it) and this saves on re-initializing the SparseIntervalMatrix's backing IndexVec with mostly empty rows for all of the leading regions in the function. Typical promoteds will only contain a few regions that need up be uplifted, while the parent function can have thousands. For a simple function repeating println!("Hello world"); 50,000 times this reduces compile times from 90 to 15 seconds in debug mode. The previous implementations re-initialization led to an overall roughly n^2 runtime as each promoted initialized slots for ~n regions, now we scale closer to linearly (5000 hello worlds takes 1.1 seconds). cc rust-lang/rust#50994, rust-lang/rust#86244
As of today, rustc 1.86.0-nightly (8361aef 2025-01-14) compiles under 2s and < 250mb. rustc -Copt-level=3 -Ztime-passes -Awarnings ./test.rs time: 0.009; rss: 49MB -> 55MB ( +6MB) parse_crate time: 0.000; rss: 59MB -> 59MB ( +1MB) crate_injection time: 0.003; rss: 59MB -> 72MB ( +13MB) expand_crate time: 0.003; rss: 59MB -> 72MB ( +13MB) macro_expand_crate time: 0.000; rss: 72MB -> 72MB ( +0MB) maybe_building_test_harness time: 0.000; rss: 72MB -> 74MB ( +2MB) finalize_macro_resolutions time: 0.000; rss: 74MB -> 74MB ( +0MB) late_resolve_crate time: 0.000; rss: 74MB -> 74MB ( +0MB) resolve_postprocess time: 0.001; rss: 72MB -> 74MB ( +2MB) resolve_crate time: 0.006; rss: 74MB -> 81MB ( +7MB) looking_for_entry_point time: 0.000; rss: 82MB -> 82MB ( +0MB) unused_lib_feature_checking time: 0.007; rss: 74MB -> 82MB ( +7MB) misc_checking_1 time: 0.001; rss: 82MB -> 84MB ( +2MB) coherence_checking time: 0.014; rss: 82MB -> 94MB ( +12MB) type_check_crate time: 0.976; rss: 94MB -> 205MB ( +111MB) MIR_borrow_checking time: 0.108; rss: 205MB -> 208MB ( +3MB) MIR_effect_checking time: 0.005; rss: 208MB -> 208MB ( +0MB) lint_checking time: 0.007; rss: 208MB -> 208MB ( +0MB) misc_checking_3 time: 0.000; rss: 208MB -> 209MB ( +0MB) monomorphization_collector_root_collections time: 0.018; rss: 209MB -> 216MB ( +7MB) monomorphization_collector_graph_walk time: 0.000; rss: 216MB -> 216MB ( +0MB) partition_and_assert_distinct_symbols time: 0.001; rss: 218MB -> 223MB ( +5MB) write_allocator_module time: 0.001; rss: 227MB -> 241MB ( +14MB) codegen_to_LLVM_IR time: 0.022; rss: 208MB -> 242MB ( +33MB) codegen_crate time: 0.008; rss: 237MB -> 201MB ( -36MB) LLVM_passes time: 0.002; rss: 199MB -> 197MB ( -1MB) finish_ongoing_codegen time: 0.034; rss: 198MB -> 198MB ( +0MB) run_linker time: 0.035; rss: 197MB -> 198MB ( +1MB) link_binary time: 0.035; rss: 197MB -> 198MB ( +1MB) link_crate time: 0.036; rss: 197MB -> 198MB ( +1MB) link time: 1.198; rss: 32MB -> 101MB ( +70MB) total The same one but with rustc -Copt-level=3 -Ztime-passes -Awarnings ./test2.rs time: 0.013; rss: 49MB -> 59MB ( +10MB) parse_crate time: 0.000; rss: 63MB -> 63MB ( +1MB) crate_injection time: 0.246; rss: 63MB -> 138MB ( +75MB) expand_crate time: 0.246; rss: 63MB -> 138MB ( +75MB) macro_expand_crate time: 0.001; rss: 138MB -> 139MB ( +0MB) AST_validation time: 0.005; rss: 139MB -> 140MB ( +1MB) finalize_macro_resolutions time: 0.037; rss: 140MB -> 141MB ( +1MB) late_resolve_crate time: 0.002; rss: 141MB -> 141MB ( +0MB) resolve_postprocess time: 0.046; rss: 139MB -> 141MB ( +2MB) resolve_crate time: 0.003; rss: 153MB -> 152MB ( -1MB) drop_ast time: 0.060; rss: 141MB -> 153MB ( +12MB) looking_for_entry_point time: 0.078; rss: 141MB -> 153MB ( +12MB) misc_checking_1 time: 0.005; rss: 153MB -> 155MB ( +2MB) coherence_checking time: 0.644; rss: 153MB -> 281MB ( +127MB) type_check_crate time: 1.426; rss: 281MB -> 444MB ( +164MB) MIR_borrow_checking time: 0.182; rss: 444MB -> 440MB ( -4MB) MIR_effect_checking time: 0.026; rss: 440MB -> 441MB ( +0MB) module_lints time: 0.026; rss: 440MB -> 441MB ( +0MB) lint_checking time: 0.029; rss: 441MB -> 441MB ( +0MB) privacy_checking_modules time: 0.059; rss: 440MB -> 441MB ( +0MB) misc_checking_3 time: 0.000; rss: 441MB -> 441MB ( +0MB) monomorphization_collector_root_collections time: 1.775; rss: 441MB -> 699MB ( +258MB) monomorphization_collector_graph_walk time: 0.002; rss: 699MB -> 699MB ( +0MB) partition_and_assert_distinct_symbols time: 0.000; rss: 701MB -> 704MB ( +3MB) write_allocator_module time: 0.133; rss: 710MB -> 732MB ( +21MB) codegen_to_LLVM_IR time: 1.913; rss: 441MB -> 732MB ( +291MB) codegen_crate time: 12.927; rss: 732MB -> 608MB ( -124MB) LLVM_passes time: 0.001; rss: 608MB -> 604MB ( -4MB) join_worker_thread time: 12.893; rss: 536MB -> 604MB ( +68MB) finish_ongoing_codegen time: 0.046; rss: 604MB -> 445MB ( -159MB) run_linker time: 0.047; rss: 604MB -> 445MB ( -158MB) link_binary time: 0.047; rss: 604MB -> 445MB ( -158MB) link_crate time: 0.047; rss: 604MB -> 445MB ( -158MB) link time: 17.606; rss: 32MB -> 111MB ( +79MB) total |
Not sure whether it's a known bug.
I tried this code:
(To save you some time, you can download the file here directly rather than generating it yourself.)
I expected to see this happen: it should take reasonable amount of memory and reasonable time to compile.
Instead, this happened: on my machine, I saw it took up to 34GB memory and 2.5min of time just to compile this.
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: