Skip to content

Long Compile Time (with Optimization) with Large Arrays (2-3 hours) #49330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
telboon opened this issue Mar 24, 2018 · 5 comments
Open

Long Compile Time (with Optimization) with Large Arrays (2-3 hours) #49330

telboon opened this issue Mar 24, 2018 · 5 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@telboon
Copy link

telboon commented Mar 24, 2018

Summary

I have gotten very long compile time (2 hrs plus) when I tried to compile a code that has a large array inbuilt in the code when I turn optimization on. When the code is compiled without optimization, it has a acceptable compile time (10+ seconds).

The array contains 130,000 + strs.

Details

I tried this code:

$ cargo build
   Compiling alliteration-gen v0.1.0 (file:///<REDACTED>)
    Finished dev [unoptimized + debuginfo] target(s) in 11.23 secs
$ cargo build --release
   Compiling alliteration-gen v0.1.0 (file:///<REDACTED>)
^C

I expected to see this happen: With the non-optimized code running in 11 seconds, the optimized one shouldn't take more than a few minutes.

Instead, this happened: When i waited for it to compile, it took 10039.81 secs (2 hrs 47 mins).

I replicated the problem again on a different Cargo project path and different machines and faced the same issue (didn't fully compile them as it would take very long. I waited for 5 minutes for each machine).

Environment

Machines tested:

  1. Ubuntu 16.04 (Dell XPS 9360 Intel i5-7200U; 8GB RAM)
  2. Kali Linux 2018.01 (Old Laptop with i5)
  3. Windows 10 with msvc (Desktop Intel i5 4570 3.2GHz; 8GB RAM)

Cargo project files attached.
nolib.zip

Meta

rustc --version --verbose:
rustc 1.22.1
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.22.1
LLVM version: 4.0

rustc 1.24.1
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.24.1
LLVM version: 4.0

rustc 1.24.1 (d3ae9a9 2018-02-27)
binary: rustc
commit-hash: d3ae9a9
commit-date: 2018-02-27
host: x86_64-pc-windows-msvc
release: 1.24.1
LLVM version: 4.0

@kennytm kennytm added I-compiletime Issue: Problems and improvements with respect to compile times. A-const-fn C-bug Category: This is a bug. and removed A-const-fn labels Mar 24, 2018
@matthiaskrgr
Copy link
Member

matthiaskrgr commented Mar 24, 2018

Seems llvm is bottlenecking here

  21,28%  librustc_trans-llvm.so                   [.] llvm::ValueHandleBase::AddToUseList
  12,09%  librustc_trans-llvm.so                   [.] llvm::Value::getContext
  11,45%  librustc_trans-llvm.so                   [.] llvm::ValueHandleBase::RemoveFromUseList
  10,25%  librustc_trans-llvm.so                   [.] llvm::ValueHandleBase::AddToExistingUseList
   9,40%  librustc_trans-llvm.so                   [.] llvm::Use::getImpliedUser
   3,45%  librustc_trans-llvm.so                   [.] llvm::InstCombiner::visitAllocSite
   1,12%  librustc_trans-llvm.so                   [.] llvm::SmallVectorTemplateBase<llvm::WeakTrackingVH, false>::grow
   0,98%  librustc_trans-llvm.so                   [.] llvm::GetElementPtrInst::hasAllZeroIndices
   0,96%  [kernel]                                 [k] clear_page_erms
   0,68%  libc-2.26.so                             [.] __memmove_sse2_unaligned_erms
   0,54%  librustc_trans-llvm.so                   [.] llvm::Use::getUser
   0,54%  [kernel]                                 [.] native_irq_return_iret
   0,44%  [i915]                                   [k] fw_domains_get

(can repro with rustc 1.26.0-nightly (c08480fce 2018-03-23) and doing release build)

@matthiaskrgr
Copy link
Member

Ok, it only needed 3.5 hours to compiler on a single core!

timing information

   Compiling alliteration-gen v0.1.0 (file:///tmp/test/nolib)
  time: 0.311; rss: 101MB       parsing
  time: 0.000; rss: 101MB       recursion limit
  time: 0.000; rss: 101MB       crate injection
  time: 0.000; rss: 101MB       plugin loading
  time: 0.000; rss: 101MB       plugin registration
  time: 0.044; rss: 124MB       expansion
  time: 0.000; rss: 124MB       maybe building test harness
  time: 0.001; rss: 124MB       maybe creating a macro crate
  time: 0.006; rss: 124MB       creating allocators
  time: 0.004; rss: 124MB       AST validation
  time: 0.006; rss: 128MB       name resolution
  time: 0.001; rss: 128MB       complete gated feature checking
  time: 0.017; rss: 146MB       lowering ast -> hir
  time: 0.018; rss: 146MB       early lint checks
warning: The output of `-Z time-llvm-passes` will only reflect timings of re-translated modules when used with incremental compilation
  time: 0.057; rss: 162MB       indexing hir
  time: 0.000; rss: 124MB       load query result cache
  time: 0.000; rss: 124MB       looking for entry point
  time: 0.000; rss: 124MB       looking for plugin registrar
  time: 0.001; rss: 124MB       loop checking
  time: 0.000; rss: 124MB       attribute checking
  time: 0.002; rss: 124MB       stability checking
  time: 0.019; rss: 139MB       type collecting
  time: 0.000; rss: 139MB       outlives testing
  time: 0.000; rss: 139MB       impl wf inference
  time: 0.000; rss: 139MB       coherence checking
  time: 0.000; rss: 139MB       variance testing
  time: 0.012; rss: 143MB       wf checking
  time: 0.000; rss: 143MB       item-types checking
  time: 1.599; rss: 204MB       item-bodies checking
  time: 0.105; rss: 211MB       rvalue promotion
  time: 0.034; rss: 211MB       privacy checking
  time: 0.002; rss: 211MB       intrinsic checking
  time: 0.002; rss: 211MB       match checking
  time: 0.003; rss: 211MB       liveness checking
  time: 1.246; rss: 423MB       borrow checking
  time: 0.000; rss: 423MB       MIR borrow checking
  time: 0.000; rss: 423MB       MIR effect checking
  time: 0.003; rss: 423MB       death checking
  time: 0.000; rss: 423MB       unused lib feature checking
  time: 0.098; rss: 423MB       lint checking
  time: 0.000; rss: 423MB       dumping chalk-like clauses
  time: 0.000; rss: 423MB       resolving dependency formats
    time: 0.000; rss: 423MB     write metadata
    time: 0.450; rss: 450MB     translation item collection
    time: 0.000; rss: 450MB     codegen unit partitioning
    time: 0.000; rss: 450MB     write allocator module
    time: 1.286; rss: 622MB     translate to LLVM IR
    time: 0.000; rss: 622MB     assert dep graph
    time: 0.000; rss: 622MB     serialize dep graph
  time: 1.739; rss: 622MB       translation
    time: 0.833; rss: 384MB     llvm function passes [alliteration_gen0]
    time: 9852.241; rss: 546MB  llvm module passes [alliteration_gen0]
    time: 1685.726; rss: 463MB  codegen passes [alliteration_gen0]
  time: 11539.432; rss: 163MB   LLVM passes
===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0205 seconds (0.0205 wall clock)
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0173 ( 99.8%)   0.0032 (100.0%)   0.0205 ( 99.8%)   0.0205 ( 99.8%)  Seed Live Regs
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Evict
   0.0174 (100.0%)   0.0032 (100.0%)   0.0205 (100.0%)   0.0205 (100.0%)  Total
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 1535.6036 seconds (1536.6184 wall clock)
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  1529.1632 ( 99.6%)   0.2696 ( 73.8%)  1529.4328 ( 99.6%)  1530.4475 ( 99.6%)  DAG Combining 1
   2.3814 (  0.2%)   0.0135 (  3.7%)   2.3949 (  0.2%)   2.3950 (  0.2%)  DAG Combining 2
   1.9965 (  0.1%)   0.0300 (  8.2%)   2.0265 (  0.1%)   2.0265 (  0.1%)  Instruction Selection
   0.8134 (  0.1%)   0.0065 (  1.8%)   0.8200 (  0.1%)   0.8200 (  0.1%)  DAG Legalization
   0.4507 (  0.0%)   0.0232 (  6.3%)   0.4739 (  0.0%)   0.4739 (  0.0%)  Instruction Scheduling
   0.2092 (  0.0%)   0.0000 (  0.0%)   0.2092 (  0.0%)   0.2092 (  0.0%)  Type Legalization
   0.1781 (  0.0%)   0.0168 (  4.6%)   0.1949 (  0.0%)   0.1949 (  0.0%)  Instruction Creation
   0.0343 (  0.0%)   0.0000 (  0.0%)   0.0343 (  0.0%)   0.0343 (  0.0%)  Vector Legalization
   0.0111 (  0.0%)   0.0056 (  1.5%)   0.0167 (  0.0%)   0.0167 (  0.0%)  Instruction Scheduling Cleanup
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  DAG Combining after legalize types
  1535.2383 (100.0%)   0.3653 (100.0%)  1535.6036 (100.0%)  1536.6184 (100.0%)  Total
===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.3086 seconds (0.3233 wall clock)
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.1348 ( 54.3%)   0.0351 ( 58.2%)   0.1699 ( 55.1%)   0.1843 ( 57.0%)  Debug Info Emission
   0.1115 ( 44.9%)   0.0247 ( 40.9%)   0.1362 ( 44.2%)   0.1366 ( 42.2%)  DWARF Exception Writer
   0.0019 (  0.8%)   0.0005 (  0.9%)   0.0024 (  0.8%)   0.0024 (  0.7%)  DWARF Debug Writer
   0.2482 (100.0%)   0.0603 (100.0%)   0.3086 (100.0%)   0.3233 (100.0%)  Total
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 11517.6105 seconds (11536.0309 wall clock)
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  9049.1177 ( 84.2%)  764.1362 ( 99.4%)  9813.2539 ( 85.2%)  9830.6488 ( 85.2%)  Combine redundant instructions
  1536.5964 ( 14.3%)   0.4161 (  0.1%)  1537.0125 ( 13.3%)  1538.0273 ( 13.3%)  Demanded bits analysis
  138.7323 (  1.3%)   3.3021 (  0.4%)  142.0344 (  1.2%)  142.1022 (  1.2%)  Machine Instruction Scheduler
  10.6652 (  0.1%)   0.0102 (  0.0%)  10.6754 (  0.1%)  10.6762 (  0.1%)  Dead Store Elimination
   2.7106 (  0.0%)   0.0500 (  0.0%)   2.7606 (  0.0%)   2.7606 (  0.0%)  SLP Vectorizer
   0.7723 (  0.0%)   0.0235 (  0.0%)   0.7958 (  0.0%)   0.7958 (  0.0%)  Combine redundant instructions
   0.7857 (  0.0%)   0.0069 (  0.0%)   0.7927 (  0.0%)   0.7927 (  0.0%)  Combine redundant instructions
   0.7482 (  0.0%)   0.0240 (  0.0%)   0.7723 (  0.0%)   0.7724 (  0.0%)  Combine redundant instructions
   0.7228 (  0.0%)   0.0233 (  0.0%)   0.7461 (  0.0%)   0.7461 (  0.0%)  Combine redundant instructions
   0.7324 (  0.0%)   0.0135 (  0.0%)   0.7459 (  0.0%)   0.7459 (  0.0%)  Combine redundant instructions
   0.5270 (  0.0%)   0.0265 (  0.0%)   0.5536 (  0.0%)   0.5536 (  0.0%)  X86 Assembly Printer
   0.4229 (  0.0%)   0.0252 (  0.0%)   0.4481 (  0.0%)   0.4481 (  0.0%)  Early CSE
   0.3849 (  0.0%)   0.0412 (  0.0%)   0.4260 (  0.0%)   0.4260 (  0.0%)  Global Value Numbering
   0.3855 (  0.0%)   0.0135 (  0.0%)   0.3991 (  0.0%)   0.3991 (  0.0%)  Combine redundant instructions
   0.3750 (  0.0%)   0.0133 (  0.0%)   0.3883 (  0.0%)   0.3883 (  0.0%)  Combine redundant instructions
   0.3343 (  0.0%)   0.0434 (  0.0%)   0.3776 (  0.0%)   0.3776 (  0.0%)  Called Value Propagation
   0.3229 (  0.0%)   0.0170 (  0.0%)   0.3399 (  0.0%)   0.3399 (  0.0%)  Early CSE w/ MemorySSA
   0.2652 (  0.0%)   0.0236 (  0.0%)   0.2887 (  0.0%)   0.2887 (  0.0%)  Value Propagation
....
    Finished release [optimized] target(s) in 11580.25 secs

@kennytm kennytm added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Mar 24, 2018
@bluss
Copy link
Member

bluss commented Mar 24, 2018

If you come across this issue and just wonder how to work around it, one can use slices instead of arrays for this code.

The diff for the reproducing code is like this, and then it compiles reasonably quickly (20 seconds) in release mode:

diff --git src/main.rs src/main.rs
index 6a6d2a3..69be593 100644
--- src/main.rs
+++ src/main.rs
@@ -8,7 +8,7 @@ fn main() {
 }
 
 fn gen_adj(alpha:&String) -> String {
-    let list_of_adj = [
+    let list_of_adj: &[&str] = &[
         "a-ok",
         "a-okay",
         "a-one",
@@ -21268,7 +21268,7 @@ fn gen_adj(alpha:&String) -> String {
 }
 
 fn gen_noun(alpha:&String) -> String {
-    let list_of_noun= [
+    let list_of_noun: &[&str] = &[
         "a'man",
         "a-bomb",
         "a-horizon",

The original code was using very large arrays, while using a static literal slice works just as well for the use case.

@oli-obk
Copy link
Contributor

oli-obk commented Mar 24, 2018

The array was not const promoted I presume, while the slice is. Thus LLVM is trying to cope with a gigantic stack allocation.

@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 31, 2020
@steveklabnik
Copy link
Member

An update two and a half years later. This includes building dependencies:

rustc +nightly --version -v
rustc 1.48.0-nightly (e2be5f568 2020-09-09)
binary: rustc
commit-hash: e2be5f568d1f60365b825530f5b5cb722460591b
commit-date: 2020-09-09
host: x86_64-pc-windows-msvc
release: 1.48.0-nightly
LLVM version: 11.0cargo +nightly build
<snip>
    Finished dev [unoptimized + debuginfo] target(s) in 8m 42scargo +nightly build --release
<snip>

... so actually I think that this got worse. I left it running for the past 8 hours, overnight, and it shows no signs of stopping. THis is on my pretty beefy desktop too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants