Skip to content

[C++23] [Modules] [std module] Skip including standard headers if the std module is imported #80663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChuanqiXu9 opened this issue Feb 5, 2024 · 6 comments
Labels
clang:modules C++20 modules and Clang Header Modules libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@ChuanqiXu9
Copy link
Member

ChuanqiXu9 commented Feb 5, 2024

While there is an annoying issue called "include after import": https://clang.llvm.org/docs/StandardCPlusPlusModules.html#including-headers-after-import-is-problematic, it should still be problematic even after we fix that. Since such style may increase the compilation time (or the size of BMIs, if it exists). And this is fundamental to clang and it looks hard to fix it completely and perfectly in the side side.

And I feel this is straightforward for people to understand:

module;
import std;
#include <vector>
#include <string>
export module M;
...

may have larger BMI and compile slower than:

module;
import std;
export module M;
...

While the example looks silly, it is actually pretty common since the standard library can be used in other headers.

In a private meeting with MSVC developer, he mentioned MSVC have extensions (or plan to?) to skip the standard headers if the std module is imported.

I feel this sounds good and can be pretty helpful to end users.

For the implementation, I don't have a complete design now. In my mind, the immediate idea may be:

  • Implement this in the library side completely.
  • Implement this in the compiler side completely.
  • Implement this within the compiler and the library.

The idea to implement this in the library may require the library to provide an additional header to include all the controlling macros. So that the user (manually) can import std in a way like:

import std;
#include <controlling_macros_for_std_headers>

#include "..."

The idea in the compiler side may need to hardcode all the filenames for the standard headers and skip entering such headers.

The idea to implement this in the compiler side and the library side is the library provides such header and the compiler can insert it automatically.

The idea to implement this is still in the early phase. Any comments are welcomed.

@ChuanqiXu9 ChuanqiXu9 added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. clang:modules C++20 modules and Clang Header Modules labels Feb 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Feb 5, 2024

@llvm/issue-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

While there is an annoying issue called "include after import": https://clang.llvm.org/docs/StandardCPlusPlusModules.html#including-headers-after-import-is-problematic, it should still be problematic to increase the compilation time (or the size of BMIs, if it exists). And this is fundamental to clang and it looks hard to fix it completely and perfectly in the side side.

And I feel this is straightforward for people to understand:

module;
import std;
#include &lt;vector&gt;
#include &lt;string&gt;
export module M;
...

may have larger BMI and compile slower than:

module;
import std;
export module M;
...

While the example looks silly, it is actually pretty common since the standard library can be used in other headers.

In a private meeting with MSVC developer, he mentioned MSVC have extensions (or plan to?) to skip the standard headers if the std module is imported.

I feel this sounds good and can be pretty helpful to end users.

For the implementation, I don't have a complete design now. In my mind, the immediate idea may be:

  • Implement this in the library side completely.
  • Implement this in the compiler side completely.
  • Implement this within the compiler and the library.

The idea to implement this in the library may require the library to provide an additional header to include all the controlling macros. So that the user (manually) can import std in a way like:

import std;
#include &lt;controlling_macros_for_std_headers&gt;

#include "..."

The idea in the compiler side may need to hardcode all the filenames for the standard headers and skip entering such headers.

The idea to implement this in the compiler side and the library side is the library provides such header and the compiler can insert it automatically.

The idea to implement this is still in the early phase. Any comments are welcomed.

@ChuanqiXu9
Copy link
Member Author

I think the option1 may be best since it can be extended to other libraries. I document it in #80687.

@mordante
Copy link
Member

I typed a long reply last week, but it seems I forgot to press comment :-/

I think there might be something possible on the library side. However the tricky part would be the macros. For example, feature-test macros, errno, and assert all require proper macro support. So I think #include <controlling_macros_for_std_headers> is not feasible since it might change the observable behavior.

Maybe it would be possible to add a special pragma to a header to tell the compiler to stop processing. Something along the lines of

#ifndef _LIBCPP_FOO_H
#define _LIBCPP_FOO_H

#include <__config>
#include <...>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
#  pragma GCC system_header
#endif

// This will probably be wrapped in a macro
#pragma clang stop_processing_when_imported_module_std 

_LIBCPP_PUSH_MACROS
#include <__undef_macros>

_LIBCPP_BEGIN_NAMESPACE_STD

// implementation

_LIBCPP_END_NAMESPACE_STD

_LIBCPP_POP_MACROS

#endif // _LIBCPP_FOO_H

In that case the compiler can stop processing when the module std has been imported.
Stopping at this point means it's still in the #ifndef _LIBCPP_FOO_H so the pre-processor needs to assume there is a matching #endif. This could be tricky when the file is experimental and has an extra #ifdef like __chrono/tzdb.h.
If multiple nested #if makes things hard for the compiler we could exclude these files.

Would this be feasible to implement in the compiler?
(I've not consulted the other libc++ developers on their thoughts.)

@ChuanqiXu9
Copy link
Member Author

I think there might be something possible on the library side. However the tricky part would be the macros. For example, feature-test macros, errno, and assert all require proper macro support. So I think #include <controlling_macros_for_std_headers> is not feasible since it might change the observable behavior.

I don't understand this. In my mind the controlling_macros_for_std_headers won't contain control macros for <version>, <assert> and <errno>. Will it still produce observable behavior in this way?

In that case the compiler can stop processing when the module std has been imported.
Stopping at this point means it's still in the #ifndef _LIBCPP_FOO_H so the pre-processor needs to assume there is a matching #endif. This could be tricky when the file is experimental and has an extra #ifdef like __chrono/tzdb.h.
If multiple nested #if makes things hard for the compiler we could exclude these files.

Would this be feasible to implement in the compiler?

Yeah, as you said, it looks not easy due to we need to find the last #endif. We may have to preprocess the whole file to find the #endif. Then I think we need a drastic change to the preprocessor to teach it to skip things... I feel this may not be feasible...

@mordante
Copy link
Member

I think there might be something possible on the library side. However the tricky part would be the macros. For example, feature-test macros, errno, and assert all require proper macro support. So I think #include <controlling_macros_for_std_headers> is not feasible since it might change the observable behavior.

I don't understand this. In my mind the controlling_macros_for_std_headers won't contain control macros for <version>, <assert> and <errno>. Will it still produce observable behavior in this way?

For example

module;
import std;
#include <vector>
#include <string>
export module M;

Inside the module users can now use FTM available in <string> and <vector>. So the transformation should retain this observable behaviour. This seems to be very hard to do correctly. Clang 15 does not know which feature-test macros libc++16 provides. So it would need additional logic. Even that logic is error-prone since libc++ might do things different in a future version, which feels like an unwanted coupling between Clang and libc++.

In that case the compiler can stop processing when the module std has been imported.
Stopping at this point means it's still in the #ifndef _LIBCPP_FOO_H so the pre-processor needs to assume there is a matching #endif. This could be tricky when the file is experimental and has an extra #ifdef like __chrono/tzdb.h.
If multiple nested #if makes things hard for the compiler we could exclude these files.
Would this be feasible to implement in the compiler?

Yeah, as you said, it looks not easy due to we need to find the last #endif. We may have to preprocess the whole file to find the #endif. Then I think we need a drastic change to the preprocessor to teach it to skip things... I feel this may not be feasible...
I feared that, but I never looked very close at Clang's preprocessor. Another solution would be

#ifndef _LIBCPP_FOO_H
#define _LIBCPP_FOO_H

#include <__config>
#include <...>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
#  pragma GCC system_header
#endif

// This will be set by the compiler if `import std;` has been encountered in the translation unit.
#ifndef __clang_imported_std

_LIBCPP_PUSH_MACROS
#include <__undef_macros>

_LIBCPP_BEGIN_NAMESPACE_STD

// implementation

_LIBCPP_END_NAMESPACE_STD

_LIBCPP_POP_MACROS

#endif // __clang_imported_std

#endif // _LIBCPP_FOO_H

This still requires Clang to parse the entire file, but the pre-processor will "remove" the implementation when the proper module is imported. If this works it might be made more generic by using something like
# !__has_imported(std), which could be used for other modules too, like #!__has_imported(fmt). This would allow other libaries to use the same optimizations as libc++. (Maybe other compiler vendors would like to do something similar.)

This is trivial to do in libc++ and I expect this shouldn't be too hard to implement in Clang. What do you think of this approach?

@ChuanqiXu9
Copy link
Member Author

I think there might be something possible on the library side. However the tricky part would be the macros. For example, feature-test macros, errno, and assert all require proper macro support. So I think #include <controlling_macros_for_std_headers> is not feasible since it might change the observable behavior.

I don't understand this. In my mind the controlling_macros_for_std_headers won't contain control macros for <version>, <assert> and <errno>. Will it still produce observable behavior in this way?

For example

module;
import std;
#include <vector>
#include <string>
export module M;

Inside the module users can now use FTM available in <string> and <vector>. So the transformation should retain this observable behaviour. This seems to be very hard to do correctly. Clang 15 does not know which feature-test macros libc++16 provides. So it would need additional logic. Even that logic is error-prone since libc++ might do things different in a future version, which feels like an unwanted coupling between Clang and libc++.

(IIUC, FTM means macro definitions, right?)

Got your point. But this is not related to Clang in my mind. If we choose the option 1, libc++ will provide the controlling_macros_for_std_headers I called and the users need to introduce #include <controlling_macros_for_std_headers> explicitly. There is nothing to do with the compiler. So we don't need to worry about version conflicting here.

(BTW, maybe we can include headers like in controlling_macros_for_std_headers too, it is not decided.)

In that case the compiler can stop processing when the module std has been imported.
Stopping at this point means it's still in the #ifndef _LIBCPP_FOO_H so the pre-processor needs to assume there is a matching #endif. This could be tricky when the file is experimental and has an extra #ifdef like __chrono/tzdb.h.
If multiple nested #if makes things hard for the compiler we could exclude these files.
Would this be feasible to implement in the compiler?

Yeah, as you said, it looks not easy due to we need to find the last #endif. We may have to preprocess the whole file to find the #endif. Then I think we need a drastic change to the preprocessor to teach it to skip things... I feel this may not be feasible...
I feared that, but I never looked very close at Clang's preprocessor. Another solution would be

#ifndef _LIBCPP_FOO_H
#define _LIBCPP_FOO_H

#include <__config>
#include <...>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
#  pragma GCC system_header
#endif

// This will be set by the compiler if `import std;` has been encountered in the translation unit.
#ifndef __clang_imported_std

_LIBCPP_PUSH_MACROS
#include <__undef_macros>

_LIBCPP_BEGIN_NAMESPACE_STD

// implementation

_LIBCPP_END_NAMESPACE_STD

_LIBCPP_POP_MACROS

#endif // __clang_imported_std

#endif // _LIBCPP_FOO_H

This still requires Clang to parse the entire file, but the pre-processor will "remove" the implementation when the proper module is imported. If this works it might be made more generic by using something like # !__has_imported(std), which could be used for other modules too, like #!__has_imported(fmt). This would allow other libaries to use the same optimizations as libc++. (Maybe other compiler vendors would like to do something similar.)

This is trivial to do in libc++ and I expect this shouldn't be too hard to implement in Clang. What do you think of this approach?

On the one hand, it should be easy for the compiler to leak the macro definition <module-name>_module_imported then the users can consume it by something like #if defined(<module-name>_module_imported).

On the other hand, however, it requires the BMI to be present before preprocessing. This is an old fixed bug in clang.

That is:

// foo.cpp
import a;
....

And the command

clang++ -std=c++20 -E foo.cpp -o -

will complain things like "failed to find module a". This breaks the design in some level. I feel the preprocessor shouldn't be affected by named modules.

And this is the reason why I prefer this to be a library solution. A pure library solution is more flexible and generalized to other libraries.

Maybe it is also an idea for the libc++ library to provide a header <std_module_imported> and define the macro std_module_imported?

I guess you may feel it may not be convinient for the users to include an additional header but I guess it may be reasonable for users to understand. And users only need this if they want to mix includes and imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

No branches or pull requests

3 participants