Skip to content

[C++20] [constexpr] Should we cache the result of constexpr/consteval expression? #61425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChuanqiXu9 opened this issue Mar 15, 2023 · 15 comments
Labels
c++20 clang:frontend Language frontend issues, e.g. anything involving "Sema" consteval C++20 consteval

Comments

@ChuanqiXu9
Copy link
Member

ChuanqiXu9 commented Mar 15, 2023

I found this one when I investigate the building time in modules.

The reproducer:

// a.cpp
constexpr bool f() {
	for (unsigned n = 0; n != 1'000'000; ++n) {
	}
	return true;
}

template<typename>
struct s {
	static constexpr auto a = f();
#ifndef NO_MULTIPLE_CALL
	static constexpr auto b = f();
	static constexpr auto c = f();
	static constexpr auto d = f();
#endif
};

template struct s<int>;
template struct s<long>;

Let's run it with:

set x
echo "Build time for duplicated"
time clang++ -std=c++20 a.cpp -c -o a.o
echo "Build time for not duplicated"
time clang++ -std=c++20 a.cpp -c -o a.o -DNO_MULTIPLE_CALL

The result in my machine shows:

Build time for duplicated

real	0m11.753s
user	0m11.611s
sys	0m0.016s
Build time for not duplicated

real	0m3.631s
user	0m3.576s
sys	0m0.013s

So now Clang doesn't cache the result of constexpr/consteval clearly. This is significant when we try to use some metaprogramming techniques. Is there any special reason?

@ChuanqiXu9 ChuanqiXu9 added c++20 consteval C++20 consteval labels Mar 15, 2023
@llvmbot
Copy link
Member

llvmbot commented Mar 15, 2023

@llvm/issue-subscribers-c-20

@tbaederr
Copy link
Contributor

I tried doing exactly that by simply having a Expr* -> APValue cache in ASTContext, but there were some test failures so I didn't proceed further. I think the problem is that you'd need to add a few flags to the hash key, e.g. whether something only being evaluated for overflow etc.

@EugeneZelenko EugeneZelenko added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Mar 15, 2023
@llvmbot
Copy link
Member

llvmbot commented Mar 15, 2023

@llvm/issue-subscribers-clang-frontend

@tbaederr
Copy link
Contributor

Also relevant discussion: #13222

@ChuanqiXu9
Copy link
Member Author

I feel that discussion didn't talk a lot. Let's try to mark it as duplicate and track in this one. (Since that one is from llvmbot. So it looks a little bit outdated to me.)

@royjacobson
Copy link
Contributor

royjacobson commented Mar 17, 2023

What happens if a template specialization is added between calls? Or with all the 'unique unevaluated lambda types' tricks? (https://github.com/DaemonSnake/unconstexpr-cpp20 for example)

C++23 also has static constexpr vars, but I'm not sure how they work. it doesn't allow mutable statics

I guess it might be possible to calculate when caching is possible, but I don't think it can trivially work.

There's also an easy workaround here with constexpr variables.

@ChuanqiXu9
Copy link
Member Author

Sent https://reviews.llvm.org/D146410 to get a more precise feeling for the proposed solution. There are several problems for the patch. But it can solve the problem in the issue and it can pass all the test except CodeGenCXX/builtin_LINE.cpp. So I guess it might be the potential solution.

After applying the above patch, the output for the reproducer would be:

Build time for duplicated

real	0m1.825s
user	0m1.796s
sys	0m0.011s
Build time for not duplicated

real	0m1.829s
user	0m1.801s
sys	0m0.008s

It looks pretty good to me.


What happens if a template specialization is added between calls?

The explicit specialization of functions after instantiation is not allowed. I guess you may want to ask for the following case?

constexpr bool f(short) {
	for (unsigned n = 0; n != 1'000'000; ++n) {
	}
	return true;
}

constexpr auto a = f((int)43);
constexpr auto b = f((int)43);
constexpr auto c = f((int)43);
constexpr auto d = f((int)43);

constexpr bool f(int) {
	for (unsigned n = 0; n != 1'000'000; ++n) {
	}
	return false;
}

constexpr auto e = f((int)43);
constexpr auto g = f((int)43);
constexpr auto h = f((int)43);
constexpr auto i = f((int)43);

It shows that we will take 3.6s to compile this. So it evaluates the two constexpr functions exactly once for each.


Although the possibility should be pretty low, the biggest problem of the solution is that the current hash algorithm is not solid in theory. It implies that it is possible the compiler may treat two different constexpr function calls as the same. In another way, the ODRHash, as the name implies, is used to perform ODR checking in modules for a pretty long time (including clang modules, so the period should be pretty long for real). If we don't like the potential conflicts for real (I think we will choose the method), we can choose the safer hash algorithm like SHA256 or use llvm::FoldingSetNodeID directly. But SHA256 is still possible to get the conflicted result. So we can't be sure completely too.

Then about the llvm::FoldingSetNodeID , it is actually a vector of unsigned and it is used to identity the unique entity. It looks great, isn't it? But the problem is that it is not good for modules. Since my intention is to reduce the the duplicated computation within modules. It would make the size of the BMI (the byproduct of module files) much larger and the size of BMI is already big.

So personally, I prefer to the current hash values (unsigned).

The second problem may be the style. To be honest, the patch above is a little bit ugly. And there are much more places I didn't touch. I want to move the process of evaluating expressions to ASTContext rather than Expr. But this requires a lot of change which I don't like. But I guess we have to do so if we want to make this.

Other problems I can see is the concern for correctness. This may be the most important thing. But it is hard to discuss without the details.

For CodeGenCXX/builtin_LINE.cpp, the problem is:

constexpr int get_line_constexpr(int l = __builtin_LINE()) {
  return l;
}

int global_two = get_line_constexpr();
const int global_three(get_line_constexpr());

The second call to get_line_constexpr() will return the result from the first call. And I don't get a good solution for this.

CC: @AaronBallman @erichkeane @cor3ntin

@cor3ntin
Copy link
Contributor

We'd need to be very careful.
constexpr functions are not pure, neither are consteval functions. I think the main observable effect right now is source_locations but reflection, logging, and other things may affect the result of constant evaluation.

I'm not sure that users can themselves get into a situation where constant evaluation would lead to a different result the second time if the first evaluation was successful. so we might be able to have a list of builtins that exclude an evaluation from caching. maybe that would be enough?

@erichkeane
Copy link
Collaborator

I think this probably needs an RFC. But as Corentin says, I'm VERY concerned here.

I think a less dangerous solution is the one that @tbaederr is working on, which is the new bitcode based interpreter, which can then be better cached/executed more quickly.

@tbaederr
Copy link
Contributor

tbaederr commented Mar 30, 2023

FYI the results with the new constant expression interpreter are:

# time gcc -std=c++20 -c ./a.cpp -fconstexpr-loop-limit=1000000
real    0m0.395s
user    0m0.364s
sys     0m0.030s

# time gcc -std=c++20 -c ./a.cpp -fconstexpr-loop-limit=1000000 -DNO_MULTIPLE_CALL
real    0m0.395s
user    0m0.366s
sys     0m0.028s

# time bin/clang++ -std=c++20 -c ./a.cpp
real    0m4.713s
user    0m4.693s
sys     0m0.014s

# time bin/clang++ -std=c++20 -c ./a.cpp -DNO_MULTIPLE_CALL
real    0m1.939s
user    0m1.926s
sys     0m0.010s

# time bin/clang++ -std=c++20 -c ./a.cpp -fexperimental-new-constant-interpreter
real    0m0.991s
user    0m0.979s
sys     0m0.011s

# time bin/clang++ -std=c++20 -c ./a.cpp -fexperimental-new-constant-interpreter -DNO_MULTIPLE_CALL
real    0m0.330s
user    0m0.317s
sys     0m0.013s

@cor3ntin
Copy link
Contributor

@tbaederr do we know what kind of witchcraft GCC is performing? it's very impressive performance numbers

@tbaederr
Copy link
Contributor

You were too fast, I just updated them. My clang numbers were with msan enabled, sorry.

@ChuanqiXu9
Copy link
Member Author

@tbaederr the numbers looks pretty good! Is there is RFC or a design doc that I can learn the higher level design? And if the -fexperimental-new-constant-interpreter is going to be enabled by default, I think I don't need to work on.

@tbaederr
Copy link
Contributor

In terms of high-level documentation, there is https://clang.llvm.org/docs/ConstantInterpreter.html, but large parts of that are out of date. I haven't found the time to update it yet.

@neko-para
Copy link

I believe this would be quite useful if compiler could promise to do so.

constexpr std::string func() { ... }
constexpr size_t N = func().size();
template <size_t M>
constexpr std::array<char, M> make_data() {
    std::string str = func();
    std::array<char, M> result(str.begin(), str.end());
    result[M - 1] = 0;
    return result;
}
constexpr auto data = make_data<N + 1>();

As shown above, I have to calculate func twice, so that I can store std::string into std::array which can cross the boundary between compiling and runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++20 clang:frontend Language frontend issues, e.g. anything involving "Sema" consteval C++20 consteval
Projects
Status: No status
Development

No branches or pull requests

8 participants