Skip to content

[C++20] Unintuitive compile times when caching the result of constexpr/consteval in modules #62796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaimfrai opened this issue May 18, 2023 · 7 comments
Assignees
Labels
clang:modules C++20 modules and Clang Header Modules

Comments

@kaimfrai
Copy link

kaimfrai commented May 18, 2023

This may or may not be related to #61425.
I have some meta-programming facilities that do the following: A templated constexpr variable calls a yet to be defined function via ADL and caches the result. In separate modules, these functions are defined for 1 concrete argument. There is a second templated argument which can change how the result is computed from previous results. These functions may depend on the result of other computations. In each module, the result for the Default Strategy is pre-computed. To illustrate this, I've substituted it with the Fibonacci sequence, going from 30 to 32 to get noticable compile times.

Cache.cppm:

export module Fibonacci.Cache;

export namespace Fibonacci
{
	constexpr unsigned long Recursive(unsigned long n)
	{
		if (n == 0)
			return 0;
		if (n == 1)
			return 1;
		return Recursive(n - 2) + Recursive(n - 1);
	}

	template<unsigned long N>
	struct Number{};

	struct DefaultStrategy
	{
		constexpr unsigned long operator()(unsigned long n, auto... other) const
		{
			return (n + ... + other);
		}
	};

	template<unsigned long N, typename Strategy = DefaultStrategy>
	constexpr unsigned long Cache = Compute(Number<N>{}, Strategy{});
}

Fib0.cppm:

export module Fibonacci.Fib0;

export import Fibonacci.Cache;

export namespace Fibonacci
{
	constexpr unsigned long Compute(Number<30ul>, auto strategy)
	{
		return strategy(Recursive(30ul));
	}

	template constexpr unsigned long Cache<30ul>;
}

Fib1.cppm:

export module Fibonacci.Fib1;

export import Fibonacci.Cache;

export namespace Fibonacci
{
	constexpr unsigned long Compute(Number<31ul>, auto strategy)
	{
		return strategy(Recursive(31ul));
	}

	template constexpr unsigned long Cache<31ul>;
}

Fib2.cppm:

export module Fibonacci.Fib2;

export import Fibonacci.Fib0;
export import Fibonacci.Fib1;

export namespace Fibonacci
{
	constexpr unsigned long Compute(Number<32ul>, auto strategy)
	{
		return strategy(Cache<30ul>, Cache<31ul>);
	}

	template constexpr unsigned long Cache<32ul>;
}

Main.cpp:

import Fibonacci.Fib2;

int main()
{
	return Fibonacci::Cache<32ul> > (1ul << 32ul);
}

Compiled with the following commands on a recent Clang 17 build:

time clang++ -std=c++20 Cache.cppm --precompile -o Cache.pcm -ftime-trace\
&& time clang++ -std=c++20 -c Cache.pcm\
&& time clang++ -std=c++20 Fib0.cppm --precompile -o Fib0.pcm -fmodule-file=Fibonacci.Cache=Cache.pcm -fconstexpr-steps=4294967295 -ftime-trace\
&& time clang++ -std=c++20 -c Fib0.pcm\
&& time clang++ -std=c++20 Fib1.cppm --precompile -o Fib1.pcm -fmodule-file=Fibonacci.Cache=Cache.pcm -fconstexpr-steps=4294967295 -ftime-trace\
&& time clang++ -std=c++20 -c Fib1.pcm\
&& time clang++ -std=c++20 Fib2.cppm --precompile -o Fib2.pcm -fmodule-file=Fibonacci.Fib0=Fib0.pcm -fmodule-file=Fibonacci.Fib1=Fib1.pcm -fconstexpr-steps=4294967295 -ftime-trace\
&& time clang++ -std=c++20 -c Fib2.pcm\
&& time clang++ -std=c++20 -c Main.cpp -fmodule-file=Fibonacci.Fib2=Fib2.pcm -fconstexpr-steps=4294967295 -ftime-trace\
&& time clang++ -std=c++20 Main.o Cache.o Fib0.o Fib1.o Fib2.o

Now what I would expect is that in Fib0 the cache for 30 gets computed and in Fib1 the cache for 31 gets computed, each using the default strategy. The results get re-used in Fib2 and the result of Fib2 gets re-used in Main. What ends up happening is that in Fib2 both the cache for 30 and 31 are computed once again and in Main, the result of Fib2 gets re-computed once again.

Case 1: All constexpr

Fib0: 4.6 sec
Fib1: 7.3 sec
Fib2: 11.8 sec
Main: 11.9 sec

However, if we substitute some consteval for constexpr it gets more unintuitive:

Case 2: Making Recursive consteval

Fib0: 9.2 sec
Fib1: 14.7 sec
Fib2: 0.02 sec
Main: 0.02 sec

Case 3: Making DefaultStrategy::operator() consteval:

Fib0: 4.6 sec
Fib1: 7.4 sec
Fib2: 0.02 sec
Main: 0.02 sec
(This is what I expected in every case)

Case 4: Making Recursive and DefaultStrategy::operator() consteval

Fib0: 9.2 sec
Fib1: 14.9 sec
Fib2: 0.02 sec
Main: 0.02 sec

Case 5: Making each Compute consteval

Fib0: 9.1 sec
Fib1: 14.5 sec
Fib2: 0.02 sec
Main: 0.02 sec

Case 6: Making Recursive and each Compute consteval

Fib0: 9.2 sec
Fib1: 14.6 sec
Fib2: 0.02 sec
Main: 0.02 sec

Case 7: Making DefaultStrategy::operator() and each Compute consteval

Fib0: 9.1 sec
Fib1: 14.8 sec
Fib2: 0.02 sec
Main: 0.02 sec

Case 8: Making all consteval

Fib0: 9.0 sec
Fib1: 14.8 sec
Fib2: 0.02 sec
Main: 0.02 sec

What makes this so confusing is that replacing constexpr with consteval can be EITHER an optimization OR a pessimization, and I cannot derive a good rule from this. Ideally, I would like all cases to behave like Case 3, which has by far the lowest compilation time. If for some reason, the above behaviour is working as intended, I would appreciate some guiding rules when consteval is beneficial or detrimental to compile times.

@EugeneZelenko EugeneZelenko added clang:modules C++20 modules and Clang Header Modules and removed new issue labels May 18, 2023
@llvmbot
Copy link
Member

llvmbot commented May 18, 2023

@llvm/issue-subscribers-clang-modules

@ChuanqiXu9
Copy link
Member

(I didn't look into the details carefully)

What is the compiler version do you use? Since recently we have a fix for a similar issue. See #61226

@ChuanqiXu9 ChuanqiXu9 self-assigned this May 19, 2023
@kaimfrai
Copy link
Author

Here's my clang++ --version:

Ubuntu clang version 17.0.0 (++20230516093312+7f7bbc73175d-1~exp1~20230516213445.70)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

I have had the issue last friday or saturday, if that helps. I didn't find the time to report until yesterday. Unfortunately I haven't tried this approach before that day.

Just in case I updated to

Ubuntu clang version 17.0.0 (++20230519053409+e21a90f09172-1~exp1~20230519053457.75)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

The issue remains, however now I get the warning
warning: it is deprecated to read module 'Fibonacci.Cache' implcitly; it is going to be removed in clang18; consider to specify the dependencies explicitly [-Wread-modules-implicitly]
so I either I didn't specify the command correctly (I assumed -fmodule-file=Module.Name=module.pcm was explicitly specifying the dependency), or it triggers a false positive.

On a side note: Typo implcitly

@ChuanqiXu9
Copy link
Member

Got it. I'll try to take a look.

however now I get the warning
warning: it is deprecated to read module 'Fibonacci.Cache' implcitly; it is going to be removed in clang18; consider to specify the dependencies explicitly [-Wread-modules-implicitly]
so I either I didn't specify the command correctly (I assumed -fmodule-file=Module.Name=module.pcm was explicitly specifying the dependency), or it triggers a false positive.

This is a warning for build systems. If you're interested, you can take a look at #62707 and #62707. For personal uses, I think you can turn off the warning simply.

@ChuanqiXu9
Copy link
Member

This should be a compiler defect. I'll try to fix this.

@kaimfrai
Copy link
Author

Thanks for the quick fix! Unfortunately, after updating to
Ubuntu clang version 17.0.0 (++20230525053731+bcb698bfdc72-1~exp1~20230525053838.88)
only one of these cases appears to be resolved, the all-constexpr case. The other cases involving one or more consteval (except of course case 3) still take about twice as long to compile. At the very least, I'll be able to always use constexpr and get the lowest possible compile time now, so I'm happy with that.

@ChuanqiXu9
Copy link
Member

Thanks for the quick fix! Unfortunately, after updating to
Ubuntu clang version 17.0.0 (++20230525053731+bcb698bfdc72-1~exp1~20230525053838.88)
only one of these cases appears to be resolved, the all-constexpr case. The other cases involving one or more consteval (except of course case 3) still take about twice as long to compile. At the very least, I'll be able to always use constexpr and get the lowest possible compile time now, so I'm happy with that.

I checked this with non-modular codes. And it shows this issue exists with non-modules code. I've filed an issue for this: #62947

veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue Aug 23, 2024
Close llvm/llvm-project#62796.

Previously, we didn't serialize the evaluated result for VarDecl. This
caused the compilation of template metaprogramming become slower than
expect. This patch fixes the issue.
veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue Aug 23, 2024
…r VarDecl

Close llvm/llvm-project#62796.

Previously, we didn't serialize the evaluated result for VarDecl. This
caused the compilation of template metaprogramming become slower than
expect. This patch fixes the issue.

This is a recommit tested with asan built clang.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules
Projects
None yet
Development

No branches or pull requests

4 participants