Skip to content

[C++20] [Modules] Writing PCM will be unexpectedly slow with additional import. #61447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChuanqiXu9 opened this issue Mar 16, 2023 · 24 comments
Assignees
Labels
clang:modules C++20 modules and Clang Header Modules

Comments

@ChuanqiXu9
Copy link
Member

ChuanqiXu9 commented Mar 16, 2023

We can find the reproducer at: https://discourse.llvm.org/t/modules-increased-build-times/68755/7

For short:

// test.cppm
module;

#include <iostream>

export module test;
import vulkan; // Vulcan is constructed big module.

will take 3~4s to generate the PCM.

And

module;
#include <iostream>
export module test.without.import;

will take about 0.5s to complete only.

Note that:

#include <iostream>
import vulkan;

will take 0.5s to complete too.

Currently I located in ASTWriter::WriteDecl. But I am not sure if this can be fixed simply.

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label Mar 16, 2023
@llvmbot
Copy link
Member

llvmbot commented Mar 16, 2023

@llvm/issue-subscribers-clang-modules

@rnikander
Copy link

@ChuanqiXu9 I may be hitting this. I've got some files taking around 45 seconds to compile and it's killing me. I wonder if there is a temporary workaround that I could apply to LLVM source locally, without the more well-thought-out fix that you will need to apply. I'm interested in compilers and hacking on Clang, so I could try it. Can you say any more about what you found in ASTWriter::WriteDecl that might be the problem?

@koplas
Copy link
Contributor

koplas commented Apr 6, 2023

It could help to upload a minimal reproducer and a flame graph. The most significant difference when importing a module is that clang::FindExternalVisibleDeclsByName gets called which seems to be slow with big modules.

@ChuanqiXu9 already linked a thread with further info: https://discourse.llvm.org/t/modules-increased-build-times/68755.
I only have files that take 5 seconds longer to compile and my reproducer is far from minimal, so a better reproducer could help.

@koplas
Copy link
Contributor

koplas commented Apr 6, 2023

@rnikander if you want to hack on it, you could try to modify:

/// isRequiredDecl - Check if this is a "required" Decl, which must be seen by
/// consumers of the AST.
///
/// Such decls will always be deserialized from the AST file, so we would like
/// this to be as restrictive as possible. Currently the predicate is driven by
/// code generation requirements, if other clients have a different notion of
/// what is "required" then we may have to consider an alternate scheme where
/// clients can iterate over the top-level decls and get information on them,
/// without necessary deserializing them. We could explicitly require such
/// clients to use a separate API call to "realize" the decl. This should be
/// relatively painless since they would presumably only do it for top-level
/// decls.
static bool isRequiredDecl(const Decl *D, ASTContext &Context,
Module *WritingModule) {
// Named modules have different semantics than header modules. Every named
// module units owns a translation unit. So the importer of named modules
// doesn't need to deserilize everything ahead of time.
if (WritingModule && WritingModule->isModulePurview()) {
// The PragmaCommentDecl and PragmaDetectMismatchDecl are MSVC's extension.
// And the behavior of MSVC for such cases will leak this to the module
// users. Given pragma is not a standard thing, the compiler has the space
// to do their own decision. Let's follow MSVC here.
if (isa<PragmaCommentDecl, PragmaDetectMismatchDecl>(D))
return true;
return false;
}
// An ObjCMethodDecl is never considered as "required" because its
// implementation container always is.
// File scoped assembly or obj-c or OMP declare target implementation must be
// seen.
if (isa<FileScopeAsmDecl, TopLevelStmtDecl, ObjCImplDecl>(D))
return true;
if (WritingModule && isPartOfPerModuleInitializer(D)) {
// These declarations are part of the module initializer, and are emitted
// if and when the module is imported, rather than being emitted eagerly.
return false;
}
return Context.DeclMustBeEmitted(D);
}

@ChuanqiXu9
Copy link
Member Author

@rnikander if you want to hack on it, you could try to modify:

This might not be related to the problem.

@ChuanqiXu9 I may be hitting this. I've got some files taking around 45 seconds to compile and it's killing me. I wonder if there is a temporary workaround that I could apply to LLVM source locally, without the more well-thought-out fix that you will need to apply. I'm interested in compilers and hacking on Clang, so I could try it. Can you say any more about what you found in ASTWriter::WriteDecl that might be the problem?

I didn't find a quick solution/workaround for this. The direct cause for the problem is that for this code style:

// a.cppm
module;
#include <iostream>
export module;
import another; // another contains many decls <iostream>

When we merge the declarations from different modules, the declarations in this TU (a.cppm) will be the canonical declarations and the declarations from the imported modules will be the redeclarations. So we have to re-write the declarations in this TU. Also when we write declarations in this TU, we'll try to lookup the decls from the external sources (the imported modules), for example:

// Next we need to do a lookup with each name into this decl context to fully
// populate any results from external sources. We don't actually use the
// results of these lookups because we only want to use the results after all
// results have been loaded and the pointers into them will be stable.
for (auto &Name : Names)
DC->lookup(Name);

This is what I found now. And as you see, it is pretty fundamental so I am not sure if there is a simple workaround.

@ChuanqiXu9
Copy link
Member Author

As a dirty (and absolutely incorrect) verifier, you can insert the following code to:

for (auto *RD : D->redecls()) {
      if (RD == D)
        continue;

      if (RD->isFromASTFile()) {
        DeclIDs[D] = RD->getGlobalID();
        return DeclIDs[D];
      }
}

Note that I don't think this is the correct direction. I think we need to do something in the lookup process if possible.

@ChuanqiXu9
Copy link
Member Author

@rnikander and for the problem of slow compilation, I am wondering if some of the reason comes from the one phase compilation. For one phase compilation and two phase compilation, you can look this at: https://clang.llvm.org/docs/StandardCPlusPlusModules.html#how-to-produce-a-bmi and https://gitlab.kitware.com/cmake/cmake/-/issues/18355#note_1329192. And cmake choose to implement one phase model initially for the sake of compatibility. But two phase compilation may be faster locally. If the scale of your project isn't too large, you can try to implement the 2 phase compilation model by cmake scripts. Then we will have a better feeling for this.

@rnikander
Copy link

Okay, thank you. I'll dig around to try to understand better. I'd like to work on Clang but realistically I probably should focus on my projects.
So I guess this process of merging declarations from modules, which you mentioned, has to occur even with the declarations in the another module that are private to it, like the #include <iostream> in it's global module fragment?
I can't compile anything at the moment because main branch is segfaulting (another issue, someone already reported). I'll look again tomorrow.

@ChuanqiXu9
Copy link
Member Author

I'd like to work on Clang but realistically I probably should focus on my projects.

Never mind. This is what open source is : )

So I guess this process of merging declarations from modules, which you mentioned, has to occur even with the declarations in the another module that are private to it, like the #include in it's global module fragment?

The deserialization and visibility checking are two related but different process. The things in global module fragment may be reachable so we have to deserialize them now.

@ChuanqiXu9 ChuanqiXu9 self-assigned this May 12, 2023
ChuanqiXu9 added a commit that referenced this issue May 12, 2023
ASTReader after we start writing

This is intended to mitigate
#61447.

Before the patch, it takes 5s to compile test.cppm in the above
reproducer. After the patch it takes 3s to compile it. Although this
patch didn't solve the problem completely, it should mitigate the
problem for sure. Noted that the behavior of the patch is consistent
with the comment of the originally empty function
ASTReader::finalizeForWriting. So the change should be consistent with
the original design.
@ChuanqiXu9
Copy link
Member Author

I sent cf47e9f to mitigate the problem. The patch changes the writing BMI time of test.cppm from 5s to 3s in my environment. But 3s is still longer than I expected. So I'll keep this openning.

@ChuanqiXu9
Copy link
Member Author

Note for developers: the root cause for the issue is that when we reconcile the declcontext redundantly many times. See the last section of ASTReader::loadDeclUpdateRecords. We will reconcile the decl context completely instead of by need, which may be a major reason for this issue.

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Apr 17, 2024

With -fexperimental-modules-reduced-bmi, the compile time for https://github.com/koplas/clang-modules-test goes down to 2.685s from 3.486s my local machine. So it can mitigate the problem a little bit but still not ideal.

I feel the problem may be somewhat pretty hard to fix (or optimize) in clang. Possibly we can only teach users don't do this. e.g. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#performance-tips

(Maybe we don't see so many issues if we're born within a world where modules came in day 1.)

CC @koplas

@ChuanqiXu9
Copy link
Member Author

Oh, I took another look at the example,

module;

#include <iostream>


export module test;
import vulkan;

export {
    void hello() {
        std::cout << "hello\n";
    }
}

Since we don't want to write the function body for non-inline function, we shouldn't have so many work in ASTWriter for sure. Although we can't solve the underlying issue, we can optimize the specific example.

(It means it will be slow too if the function is inline)

@koplas
Copy link
Contributor

koplas commented Apr 17, 2024

Thank you @ChuanqiXu9 for the progress that has been made.

My problem lies in the usage of third-party libraries, that are used like this:

module;
#include "big_header.hpp"
export module third_party;
export {
    using decl_1;
    using decl_2;
    ...
}

The export-extern-style would certainly fix the performance issue for me, but it would hard to port all third-party libraries to this style. Even when std modules and #80663 reduce the impact, it can be a major performance hurdle for new module users.
To be able to import a lot of big third-party libraries without a big performance impact would be the biggest C++ modules selling point for me. I wouldn't mind if the ASTWriter is still slow for the inline function case.

@ChuanqiXu9
Copy link
Member Author

Yeah, this is more of the topic of how to transfer a header based library to a module based one. And the biggest challenge I saw is how to handle third party libraries. Then it is somewhat a annoying issue. We feel it will be easy if all the third party libraries provide the modules. Note that this definition is recursive. For example, if someday boost provides a modules, ideally, it will be best if it imports std instead of wrapping itself like std module. The std module is special since it is the root of dependencies. Another example is, now it may be fine to write a new module-native library only depends on std module.

But the bad problem is, how can we treat the third party libraries not offering modules? This is annoying since we're doing other people's job. Ideally, it should be the job of other libraries to provide such thing nicely. But now we need to worry about that.

Then it becomes more of an ecosystem problem.

@koplas
Copy link
Contributor

koplas commented Apr 17, 2024

A bit of a question of understanding. Consider this:

module;
#include <iostream>
export module test;

export {
    void hello() {}
}

The resulting pcm file compiled with reduced bmi is 3,4 Megabytes big. If I add import vulkan; it increases to 5,5 Megabytes. Shouldn't both files be relatively small, as they use nothing from the GMF?

Update: Without reduced bmi the respective file size is 4,5 and 7,4 Megabytes. The performance improvement seems to be linear to the file size.

ChuanqiXu9 added a commit that referenced this issue Apr 18, 2024
Reduced BMI

Mitigate #61447

The root cause of the above problem is that when we write a declaration,
we need to lookup all the redeclarations in the imported modules. Then
it will be pretty slow if there are too many redeclarations in different
modules. This patch doesn't solve the porblem.

What the patchs mitigated is, when we writing a named module, we shouldn't
write the declarations from GMF if it is unreferenced **in current
module unit**. The difference here is that, if the declaration is used
in the imported modules, we used to emit it as an update. But we
definitely want to avoid that after this patch.

For that reproducer in
#61447, it used to take 2.5s
to compile and now it only takes 0.49s to compile, which is a big win.
@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Apr 18, 2024

The resulting pcm file compiled with reduced bmi is 3,4 Megabytes big. If I add import vulkan; it increases to 5,5 Megabytes. Shouldn't both files be relatively small, as they use nothing from the GMF?

This is a problem of quality of implementation. The reduced BMI is still in the very early stage. We're still optimizing it.
And this the problem I said why reduced BMI doesn't work. Since we don't write entities in non-inline function bodies, we simply shouldn't write any thing.

I just make a new patch to improve this. After this patch, with reduced BMI, the result of https://github.com/koplas/clang-modules-test goes to:

Building test.cppm

real	0m0.494s
user	0m0.450s
sys	0m0.037s
Building test-without-import.cppm

real	0m0.466s
user	0m0.430s
sys	0m0.033s

The number looks good and we're probably near the extreme. import <module-name> is not a free operation.

But I still want to emphasize that, it doesn't fix the fundamental issue. It will still be slow if we're writing a declaration which has a lot redeclarations in the imported modules. In another word, in the example of https://github.com/koplas/clang-modules-test, if every partition unit of module vulkan uses declarations from <iostream> extensively (not in a non-inline function body, which we just skipped), the writing performance should be pretty slow still. So I don't mind keep the issue opening.

Also from my testing, it may not not a big herotic optimization to real world examples, https://github.com/alibaba/async_simple/tree/CXX20Modules (I tried to make it modules native), I only observed 1%+ size reduction and rough 2%+ compile speed improvements.

@koplas
Copy link
Contributor

koplas commented Apr 18, 2024

I just tested it on my toy project, and the time spent in the frontend is now twice as large as the time spent in the ASTWriter. So I consider it a significant improvement, I guess a switch to std modules would reduce the remaining performance impact.
Currently one file crashes with -fexperimental-modules-reduced-bmi:

/usr/local/bin/clang++  -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /usr/local/bin/clang++ -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
1.	<eof> parser at end of file
2.	Per-file LLVM IR generation
3.	/usr/local/bin/../include/c++/v1/new:268:29: Generating code for declaration 'std::__libcpp_operator_new'
 #0 0x00005599ecbb4496 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/bin/clang+++0x3025496)
 #1 0x00005599ecbb207e llvm::sys::RunSignalHandlers() (/usr/local/bin/clang+++0x302307e)
 #2 0x00005599ecb23450 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007faf8d65a770 (/usr/lib/libc.so.6+0x40770)
 #4 0x00005599ec6f1af4 llvm::Value::getContext() const (/usr/local/bin/clang+++0x2b62af4)
 #5 0x00005599ec68275c llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::AtomicOrdering, unsigned char, llvm::Instruction*) (/usr/local/bin/clang+++0x2af375c)
 #6 0x00005599ec6826cf llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::Instruction*) (/usr/local/bin/clang+++0x2af36cf)
 #7 0x00005599eaacd3be llvm::IRBuilderBase::CreateAlignedStore(llvm::Value*, llvm::Value*, llvm::MaybeAlign, bool) (/usr/local/bin/clang+++0xf3e3be)
 #8 0x00005599ece029d5 clang::CodeGen::CodeGenFunction::EmitReturnStmt(clang::ReturnStmt const&) (/usr/local/bin/clang+++0x32739d5)
 #9 0x00005599ece0b360 clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/usr/local/bin/clang+++0x327c360)
#10 0x00005599ece65d06 clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) (/usr/local/bin/clang+++0x32d6d06)
#11 0x00005599ece889af clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f99af)
#12 0x00005599ece817a1 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f27a1)
#13 0x00005599ece74dc3 clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5dc3)
#14 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#15 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#16 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#17 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#18 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#19 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#20 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#21 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#22 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#23 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#24 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#25 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#26 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#27 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#28 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#29 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#30 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#31 0x00005599ece7259b clang::CodeGen::CodeGenModule::Release() (/usr/local/bin/clang+++0x32e359b)
#32 0x00005599ed25fc18 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) ModuleBuilder.cpp:0:0
#33 0x00005599ed25700b clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x36c800b)
#34 0x00005599ed53f7bc clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x39b07bc)
#35 0x00005599eed04549 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/bin/clang+++0x5175549)
#36 0x00005599ed4fbf14 clang::FrontendAction::Execute() (/usr/local/bin/clang+++0x396cf14)
#37 0x00005599ed464854 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/bin/clang+++0x38d5854)
#38 0x00005599ed5c8d50 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/bin/clang+++0x3a39d50)
#39 0x00005599eaa81f21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/bin/clang+++0xef2f21)
#40 0x00005599eaa7f1d0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#41 0x00005599ed2a6b29 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#42 0x00005599ecb231ec llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/local/bin/clang+++0x2f941ec)
#43 0x00005599ed2a64c3 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/local/bin/clang+++0x37174c3)
#44 0x00005599ed2646a2 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/usr/local/bin/clang+++0x36d56a2)
#45 0x00005599ed2648fe clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/usr/local/bin/clang+++0x36d58fe)
#46 0x00005599ed28402f clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/usr/local/bin/clang+++0x36f502f)
#47 0x00005599eaa7e857 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/bin/clang+++0xeef857)
#48 0x00005599eaa8d486 main (/usr/local/bin/clang+++0xefe486)
#49 0x00007faf8d643cd0 (/usr/lib/libc.so.6+0x29cd0)
#50 0x00007faf8d643d8a __libc_start_main (/usr/lib/libc.so.6+0x29d8a)
#51 0x00005599eaa7cc75 _start (/usr/local/bin/clang+++0xeedc75)

The file looks like this:

module;

export module app;

import lime.Game;

export int app_main() {
    lime::Game game = {};
    return 0;
}

If I use lime::Game *game = new lime::Game{}; the crash doesn't occur.

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Apr 18, 2024

I just tested it on my toy project, and the time spent in the frontend is now twice as large as the time spent in the ASTWriter. So I consider it a significant improvement, I guess a switch to std modules would reduce the remaining performance impact. Currently one file crashes with -fexperimental-modules-reduced-bmi:

/usr/local/bin/clang++  -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /usr/local/bin/clang++ -I/home/paul/Documents/lime-engine/third_party/include -I/home/paul/Documents/lime-engine/third_party/glm -I/home/paul/Documents/lime-engine/third_party/glfw/include -stdlib=libc++ -Wall -ftime-trace -fno-rtti -mavx2 -fno-exceptions -fexperimental-modules-reduced-bmi -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -Wno-nullability-completeness -ferror-limit=0 -g -std=gnu++20 -fcolor-diagnostics -MD -MT CMakeFiles/lime-engine.dir/src/app.ixx.o -MF CMakeFiles/lime-engine.dir/src/app.ixx.o.d @CMakeFiles/lime-engine.dir/src/app.ixx.o.modmap -o CMakeFiles/lime-engine.dir/src/app.ixx.o -c /home/paul/Documents/lime-engine/src/app.ixx
1.	<eof> parser at end of file
2.	Per-file LLVM IR generation
3.	/usr/local/bin/../include/c++/v1/new:268:29: Generating code for declaration 'std::__libcpp_operator_new'
 #0 0x00005599ecbb4496 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/bin/clang+++0x3025496)
 #1 0x00005599ecbb207e llvm::sys::RunSignalHandlers() (/usr/local/bin/clang+++0x302307e)
 #2 0x00005599ecb23450 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007faf8d65a770 (/usr/lib/libc.so.6+0x40770)
 #4 0x00005599ec6f1af4 llvm::Value::getContext() const (/usr/local/bin/clang+++0x2b62af4)
 #5 0x00005599ec68275c llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::AtomicOrdering, unsigned char, llvm::Instruction*) (/usr/local/bin/clang+++0x2af375c)
 #6 0x00005599ec6826cf llvm::StoreInst::StoreInst(llvm::Value*, llvm::Value*, bool, llvm::Align, llvm::Instruction*) (/usr/local/bin/clang+++0x2af36cf)
 #7 0x00005599eaacd3be llvm::IRBuilderBase::CreateAlignedStore(llvm::Value*, llvm::Value*, llvm::MaybeAlign, bool) (/usr/local/bin/clang+++0xf3e3be)
 #8 0x00005599ece029d5 clang::CodeGen::CodeGenFunction::EmitReturnStmt(clang::ReturnStmt const&) (/usr/local/bin/clang+++0x32739d5)
 #9 0x00005599ece0b360 clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/usr/local/bin/clang+++0x327c360)
#10 0x00005599ece65d06 clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) (/usr/local/bin/clang+++0x32d6d06)
#11 0x00005599ece889af clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f99af)
#12 0x00005599ece817a1 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/usr/local/bin/clang+++0x32f27a1)
#13 0x00005599ece74dc3 clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5dc3)
#14 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#15 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#16 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#17 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#18 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#19 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#20 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#21 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#22 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#23 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#24 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#25 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#26 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#27 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#28 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#29 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#30 0x00005599ece74ddf clang::CodeGen::CodeGenModule::EmitDeferred() (/usr/local/bin/clang+++0x32e5ddf)
#31 0x00005599ece7259b clang::CodeGen::CodeGenModule::Release() (/usr/local/bin/clang+++0x32e359b)
#32 0x00005599ed25fc18 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) ModuleBuilder.cpp:0:0
#33 0x00005599ed25700b clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x36c800b)
#34 0x00005599ed53f7bc clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/bin/clang+++0x39b07bc)
#35 0x00005599eed04549 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/bin/clang+++0x5175549)
#36 0x00005599ed4fbf14 clang::FrontendAction::Execute() (/usr/local/bin/clang+++0x396cf14)
#37 0x00005599ed464854 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/bin/clang+++0x38d5854)
#38 0x00005599ed5c8d50 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/bin/clang+++0x3a39d50)
#39 0x00005599eaa81f21 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/bin/clang+++0xef2f21)
#40 0x00005599eaa7f1d0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#41 0x00005599ed2a6b29 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#42 0x00005599ecb231ec llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/local/bin/clang+++0x2f941ec)
#43 0x00005599ed2a64c3 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/local/bin/clang+++0x37174c3)
#44 0x00005599ed2646a2 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/usr/local/bin/clang+++0x36d56a2)
#45 0x00005599ed2648fe clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/usr/local/bin/clang+++0x36d58fe)
#46 0x00005599ed28402f clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/usr/local/bin/clang+++0x36f502f)
#47 0x00005599eaa7e857 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/bin/clang+++0xeef857)
#48 0x00005599eaa8d486 main (/usr/local/bin/clang+++0xefe486)
#49 0x00007faf8d643cd0 (/usr/lib/libc.so.6+0x29cd0)
#50 0x00007faf8d643d8a __libc_start_main (/usr/lib/libc.so.6+0x29d8a)
#51 0x00005599eaa7cc75 _start (/usr/local/bin/clang+++0xeedc75)

The file looks like this:

module;

export module app;

import lime.Game;

export int app_main() {
    lime::Game game = {};
    return 0;
}

If I use lime::Game *game = new lime::Game{}; the crash doesn't occur.

Thanks for testing this. Could you try to give a reproduced example for this crash? Then I can only actually fix it.


BTW, what's the overall end-to-end improvement this change brings to your project?

@koplas
Copy link
Contributor

koplas commented Apr 21, 2024

BTW, what's the overall end-to-end improvement this change brings to your project?

Using -fexperimental-modules-reduced-bmi reduced the compile time of a single file from 2.94s to 2.45s. The WriteAST reduced from 1.2s to 0.87s. The PCM file size decreased from 20 MB to 16.3 MB. The reduction is similar to other module files. A large amount of time is spent on the frontend with std headers, which could be reduced with std modules.

@rnikander Maybe you can provide some benchmarks, as you had some files that had far worse compile times.

Thanks for testing this. Could you try to give a reproduced example for this crash? Then I can only actually fix it.

I could reduce the lime.Game module to this:

module;

export module lime.Game;

import lime.Render;
import entt;

export namespace lime {
    class Game {
    private:
        entt::registry m_registry;
    };
}// namespace lime

However lime.Render and entt are quite large so a minimal reproducer would take some time to create.

@koplas
Copy link
Contributor

koplas commented Apr 21, 2024

I just tested std modules with reduced bmi. Here is the trace without std modules:
image
And here with std modules:
image
This looks like the performance I expected from modules. As I no longer see an exponential compile time increase the issue is solved for me. The crash also doesn't occur if I link with std modules.

@ChuanqiXu9
Copy link
Member Author

However lime.Render and entt are quite large so a minimal reproducer would take some time to create.

Thanks. It will be pretty helpful to improve the quality of the implementation.

@kamrann
Copy link

kamrann commented Jun 29, 2024

Just wanted to add another data point here after my first attempt at using modules. Briefly, I've modularized a library which is predominantly headers, and template-heavy. My initial approach has been to transform headers into interface partitions pretty much 1:1, with just a small number of distinct top-level modules. At this point I'm not using import std, so I get that it's far from ideal as there is heavy duplication of STL includes in GMFs.

I'm also seeing some very slow compilation times (for the partition TUs themselves, the few implementation units I have, and most of all for downstream consuming TUs), and it is heavily weighted to WriteAST as mentioned above.

The one thing I've noticed and am wondering if is on anyone's radar is memory consumption. From some rough eyeballing of processes, TUs for which clang++ caps out at around 400MB RAM usage in non-modular builds, are pushing up around 3GB in the modules build. One particular TU goes up to 6GB. I believe this is exacerbating the time spent in WriteAST in the case of parallel builds due to memory paging - occasionally I've seen really absurd times (10s of minutes, all falling within WriteAST) when the memory contention has been particularly bad.

Note that -fexperimental-modules-reduced-bmi definitely mitigates this - they're generally capping out around 1.8GB (vs 3GB), and the absurdly long compiles disappear. However this is still a number of times higher memory usage than without modules, and compilation times, while better than with the fat BMI, are still problematic.

@ChuanqiXu9
Copy link
Member Author

With using modules, it is expected some more memory usages. But if it is a problem depends on the numbers. So my comment in the other issue you opened, the key idea here is to reduce the redeclarations. So it is pretty important to import third party libraries by import std instead of including them everywhere. And -fexperimental-modules-reduced-bmi will be helpful too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules
Projects
None yet
Development

No branches or pull requests

5 participants