-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[C++20] [Modules] Writing PCM will be unexpectedly slow with additional import. #61447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-clang-modules |
@ChuanqiXu9 I may be hitting this. I've got some files taking around 45 seconds to compile and it's killing me. I wonder if there is a temporary workaround that I could apply to LLVM source locally, without the more well-thought-out fix that you will need to apply. I'm interested in compilers and hacking on Clang, so I could try it. Can you say any more about what you found in |
It could help to upload a minimal reproducer and a flame graph. The most significant difference when importing a module is that @ChuanqiXu9 already linked a thread with further info: https://discourse.llvm.org/t/modules-increased-build-times/68755. |
@rnikander if you want to hack on it, you could try to modify: llvm-project/clang/lib/Serialization/ASTWriterDecl.cpp Lines 2490 to 2532 in e58a493
|
This might not be related to the problem.
I didn't find a quick solution/workaround for this. The direct cause for the problem is that for this code style:
When we merge the declarations from different modules, the declarations in this TU (a.cppm) will be the canonical declarations and the declarations from the imported modules will be the redeclarations. So we have to re-write the declarations in this TU. Also when we write declarations in this TU, we'll try to lookup the decls from the external sources (the imported modules), for example: llvm-project/clang/lib/Serialization/ASTWriter.cpp Lines 3960 to 3965 in edc8b60
This is what I found now. And as you see, it is pretty fundamental so I am not sure if there is a simple workaround. |
As a dirty (and absolutely incorrect) verifier, you can insert the following code to:
Note that I don't think this is the correct direction. I think we need to do something in the lookup process if possible. |
@rnikander and for the problem of slow compilation, I am wondering if some of the reason comes from the one phase compilation. For one phase compilation and two phase compilation, you can look this at: https://clang.llvm.org/docs/StandardCPlusPlusModules.html#how-to-produce-a-bmi and https://gitlab.kitware.com/cmake/cmake/-/issues/18355#note_1329192. And cmake choose to implement one phase model initially for the sake of compatibility. But two phase compilation may be faster locally. If the scale of your project isn't too large, you can try to implement the 2 phase compilation model by cmake scripts. Then we will have a better feeling for this. |
Okay, thank you. I'll dig around to try to understand better. I'd like to work on Clang but realistically I probably should focus on my projects. |
Never mind. This is what open source is : )
The deserialization and visibility checking are two related but different process. The things in global module fragment may be reachable so we have to deserialize them now. |
ASTReader after we start writing This is intended to mitigate #61447. Before the patch, it takes 5s to compile test.cppm in the above reproducer. After the patch it takes 3s to compile it. Although this patch didn't solve the problem completely, it should mitigate the problem for sure. Noted that the behavior of the patch is consistent with the comment of the originally empty function ASTReader::finalizeForWriting. So the change should be consistent with the original design.
I sent cf47e9f to mitigate the problem. The patch changes the writing BMI time of test.cppm from 5s to 3s in my environment. But 3s is still longer than I expected. So I'll keep this openning. |
Note for developers: the root cause for the issue is that when we reconcile the declcontext redundantly many times. See the last section of |
With I feel the problem may be somewhat pretty hard to fix (or optimize) in clang. Possibly we can only teach users don't do this. e.g. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#performance-tips (Maybe we don't see so many issues if we're born within a world where modules came in day 1.) CC @koplas |
Oh, I took another look at the example,
Since we don't want to write the function body for non-inline function, we shouldn't have so many work in ASTWriter for sure. Although we can't solve the underlying issue, we can optimize the specific example. (It means it will be slow too if the function is inline) |
Thank you @ChuanqiXu9 for the progress that has been made. My problem lies in the usage of third-party libraries, that are used like this: module;
#include "big_header.hpp"
export module third_party;
export {
using decl_1;
using decl_2;
...
} The export-extern-style would certainly fix the performance issue for me, but it would hard to port all third-party libraries to this style. Even when std modules and #80663 reduce the impact, it can be a major performance hurdle for new module users. |
Yeah, this is more of the topic of how to transfer a header based library to a module based one. And the biggest challenge I saw is how to handle third party libraries. Then it is somewhat a annoying issue. We feel it will be easy if all the third party libraries provide the modules. Note that this definition is recursive. For example, if someday boost provides a modules, ideally, it will be best if it imports std instead of wrapping itself like std module. The std module is special since it is the root of dependencies. Another example is, now it may be fine to write a new module-native library only depends on std module. But the bad problem is, how can we treat the third party libraries not offering modules? This is annoying since we're doing other people's job. Ideally, it should be the job of other libraries to provide such thing nicely. But now we need to worry about that. Then it becomes more of an ecosystem problem. |
A bit of a question of understanding. Consider this: module;
#include <iostream>
export module test;
export {
void hello() {}
} The resulting pcm file compiled with reduced bmi is 3,4 Megabytes big. If I add Update: Without reduced bmi the respective file size is 4,5 and 7,4 Megabytes. The performance improvement seems to be linear to the file size. |
Reduced BMI Mitigate #61447 The root cause of the above problem is that when we write a declaration, we need to lookup all the redeclarations in the imported modules. Then it will be pretty slow if there are too many redeclarations in different modules. This patch doesn't solve the porblem. What the patchs mitigated is, when we writing a named module, we shouldn't write the declarations from GMF if it is unreferenced **in current module unit**. The difference here is that, if the declaration is used in the imported modules, we used to emit it as an update. But we definitely want to avoid that after this patch. For that reproducer in #61447, it used to take 2.5s to compile and now it only takes 0.49s to compile, which is a big win.
This is a problem of quality of implementation. The reduced BMI is still in the very early stage. We're still optimizing it. I just make a new patch to improve this. After this patch, with reduced BMI, the result of https://github.com/koplas/clang-modules-test goes to:
The number looks good and we're probably near the extreme. But I still want to emphasize that, it doesn't fix the fundamental issue. It will still be slow if we're writing a declaration which has a lot redeclarations in the imported modules. In another word, in the example of https://github.com/koplas/clang-modules-test, if every partition unit of module vulkan uses declarations from Also from my testing, it may not not a big herotic optimization to real world examples, https://github.com/alibaba/async_simple/tree/CXX20Modules (I tried to make it modules native), I only observed 1%+ size reduction and rough 2%+ compile speed improvements. |
I just tested it on my toy project, and the time spent in the frontend is now twice as large as the time spent in the ASTWriter. So I consider it a significant improvement, I guess a switch to std modules would reduce the remaining performance impact.
The file looks like this: module;
export module app;
import lime.Game;
export int app_main() {
lime::Game game = {};
return 0;
} If I use |
Thanks for testing this. Could you try to give a reproduced example for this crash? Then I can only actually fix it. BTW, what's the overall end-to-end improvement this change brings to your project? |
Using @rnikander Maybe you can provide some benchmarks, as you had some files that had far worse compile times.
I could reduce the lime.Game module to this: module;
export module lime.Game;
import lime.Render;
import entt;
export namespace lime {
class Game {
private:
entt::registry m_registry;
};
}// namespace lime However |
Thanks. It will be pretty helpful to improve the quality of the implementation. |
Just wanted to add another data point here after my first attempt at using modules. Briefly, I've modularized a library which is predominantly headers, and template-heavy. My initial approach has been to transform headers into interface partitions pretty much 1:1, with just a small number of distinct top-level modules. At this point I'm not using I'm also seeing some very slow compilation times (for the partition TUs themselves, the few implementation units I have, and most of all for downstream consuming TUs), and it is heavily weighted to The one thing I've noticed and am wondering if is on anyone's radar is memory consumption. From some rough eyeballing of processes, TUs for which clang++ caps out at around 400MB RAM usage in non-modular builds, are pushing up around 3GB in the modules build. One particular TU goes up to 6GB. I believe this is exacerbating the time spent in Note that |
With using modules, it is expected some more memory usages. But if it is a problem depends on the numbers. So my comment in the other issue you opened, the key idea here is to reduce the redeclarations. So it is pretty important to import third party libraries by |
We can find the reproducer at: https://discourse.llvm.org/t/modules-increased-build-times/68755/7
For short:
will take 3~4s to generate the PCM.
And
will take about 0.5s to complete only.
Note that:
will take 0.5s to complete too.
Currently I located in
ASTWriter::WriteDecl
. But I am not sure if this can be fixed simply.The text was updated successfully, but these errors were encountered: