Skip to content

Several trivial Vec<T> methods (deref, as_ref, etc) end up in compiled output when optimizing for size. #89389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thomcc opened this issue Sep 30, 2021 · 7 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@thomcc
Copy link
Member

thomcc commented Sep 30, 2021

Godbolt repro: https://rust.godbolt.org/z/caG4MEjsr

With -Copt-level=s and -Copt-level=z the functions <Vec<T> as Deref>::deref and <Vec<T> as AsRef<[T]>>::as_ref show up in the output pointlessly. Similarly for the mut versions (AsMut, DerefMut, etc), and a few similar methods I tried (Borrow::borrow for example).

There are probably some other similar issues too (it seems a lot of our library code is optimized for -Copt-level=3 rather than with size optimizations...).

Note that this can end up with a lot of copies of these functions, since you get at least one for each T you use Vec<T>::deref with, which is basically one for each T you put in a Vec.

Also note that these get fully optimized away on both -Copt-level=3 and -Copt-level=2.

One way to fix this is probably to make these #[inline(always)], but it seems weird that these would show up at all, so perhaps there's a better way which can fix the issue more broadly.

@the8472
Copy link
Member

the8472 commented Oct 2, 2021

I think godbolt is running rustc to build a .o. Doesn't linking strip these from the final artifact?

@thomcc
Copy link
Member Author

thomcc commented Oct 2, 2021

Hm, yeah I see your point — for the godbolt link I posted, I would imagine something can remove those as they are never called.

I think the answer to your question is yest though. At least, they showed up in some of twiggy's outputs, so I think so? I didn't bother investigating that hard after I managed to repro on godbolt.

FWIW, I somewhat recently complained about this issue on the community discord server, and someone (@saethlin, unless that username refers to two separate people) said1 that they had seen it before, and that it goes away if you emit IR and run it manually through opt.

That's too fiddly for me, so I didn't try it (I'd want to try fiddling with the passes list / inline threshold first)... but it indicates others hit this too.

Footnotes

  1. Modulo some reformatting, the actual conversation was:

    <saethlin>: Oh my it's that again
    <saethlin>: Btw you can get around this by running the IR through opt again
    <saethlin>: I don't know why LLVM needs 2 attempts to figure out out
    <saethlin>: Must be pass ordering
    <zuurr>: hmm
    <zuurr>: i'll see if i can do it with -Cpasses
    <zuurr>: really this doesnt matter, it just bugs me to see it in the output

    This is still true (that it doesn't matter for me), but s/z get used a lot more than it might be obvious...

@saethlin
Copy link
Member

saethlin commented Oct 3, 2021

Yes, you got the right person :)

I'm unclear if that godbolt link is demonstrating what you wanted. As far as I can tell, the problem is not that the symbols are left in the output. The problem is that these trivial functions aren't inlined.

In my experience, these examples appear with -Clto=fat and not with -Ccodegen-units=1. I suppose it's possible to hand-wave away your examples with size optimization because maybe the monomorphised function wasn't in the same codegen unit as its caller, but I cannot fathom any NOTABUG explanation for why fat LTO misses an inlining opportunity. I'm pretty sure these symbols can't be left behind because they're required by dynamic dispatch; if they were they wouldn't be cleaned up by -Ccodegen-units=1.

I can currently reproduce this https://github.com/saethlin/fls, 2438a1579fe69fd85ff66b1fe1390f578dd9d08e, with

rustc 1.57.0-nightly (f03eb6bef 2021-10-02)
binary: rustc
commit-hash: f03eb6bef8ced8a243858b819e013b9caf83d757
commit-date: 2021-10-02
host: x86_64-unknown-linux-gnu
release: 1.57.0-nightly
LLVM version: 13.0.0

by running cargo +nightly bloat --release -n 100000 | less

I can find similar examples of this form (trivial function not inlined with -Clto=fat but inlined with -Ccodegen-units=1). Try it out on your favorite codebase, I bet you can find a few examples.

From looking at the calls specifically to <alloc::vec::Vec<T,A> as core::ops::deref::Deref>::deref, it looks like they are all in functions that have seen a lot of inlining and become quite massive already. I speculated that this might be pass ordering because I can imagine that Vec's Deref impl isn't considered for inlining until after some other operation has been done that drives the code size of the caller up so far that inlining this function is no longer deemed profitable by some heuristic, even though I'm pretty sure it would actually be profitable to inline this.

I have never fiddled with any LLVM flags in order to dig deeper. I really don't know what I'm doing there, and passing arguments to the LLVM components through cargo has always confused me.

@tmiasko
Copy link
Contributor

tmiasko commented Oct 4, 2021

With -Copt-level=s and -Copt-level=z the functions <Vec<T> as Deref>::deref and <Vec<T> as AsRef<[T]>>::as_ref show up in the output pointlessly.

@thomcc the opt-level 0, 1, s and z implicitly enable -Zshare-generics mode which makes generic code instantiated in a crate available to the downstream crates. This is likely responsible for what you are observing here.

In my experience, these examples appear with -Clto=fat and not with -Ccodegen-units=1.

@saethlin could you retest with -Zmerge-functions=disabled? MergeFunctions pass seems to introduce an IR shape, that excludes deref callsite from consideration for inlining altogether.

@tmiasko
Copy link
Contributor

tmiasko commented Oct 4, 2021

Test case replicating situation created by MergeFunctions pass (inlining would work with opaque pointer types):

; ModuleID = 'a.ll'
source_filename = "fls.4e5c6cbe-cgu.3"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind nonlazybind
define hidden i32 @f() unnamed_addr #0 {
  %1 = call { [0 x i32]*, i64 } bitcast ({ [0 x i64]*, i64 } ()* @g to { [0 x i32]*, i64 } ()*)() #0
  ret i32 0
}

; Function Attrs: nounwind nonlazybind
define hidden { [0 x i64]*, i64 } @g() unnamed_addr #0 {
  ret { [0 x i64]*, i64 } undef
}

attributes #0 = { nounwind nonlazybind "target-cpu"="x86-64" }

@rustbot modify labels: +A-llvm

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 4, 2021
@saethlin
Copy link
Member

saethlin commented Oct 4, 2021

@tmiasko I can confirm that for the examples that I have, -Zmerge-functions=disabled seems to fix the specific problem I'm mentioning here. However, adding that flag does something else confusing. I'm left with a lot of core::ptr::drop_in_place<&T> symbols. fls goes from 2 to 4 when this flag is added, and a large internal codebase goes from 1 to 151.

@saethlin
Copy link
Member

I don't really know when this was fixed, but I'm not seeing the particular codegen that I complained about on 1.58.0 (latest stable) or nightly-2022-01-15 (latest nightly) anymore.

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants