Skip to content

distinct lambdas incorrectly merged across modules #60985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zygoloid opened this issue Feb 25, 2023 · 7 comments
Closed

distinct lambdas incorrectly merged across modules #60985

zygoloid opened this issue Feb 25, 2023 · 7 comments
Assignees
Labels
c++11 c++20 clang:modules C++20 modules and Clang Header Modules miscompilation

Comments

@zygoloid
Copy link
Collaborator

zygoloid commented Feb 25, 2023

If a lambda appears within a variable template, then instantiations of that lambda have the lexical context of the variable as their lexical context. This causes Clang's serialization layer to be unable to distinguish them across modules, because its approach is to number unnamed declarations within their lexical context, and two distinct lambdas from different instantiations of the variable template can end up with the same number.

Example, ready to be dropped into test/Modules/:

// RUN: rm -rf %t
// RUN: mkdir -p %t
// RUN: split-file %s %t
//
// RUN: %clang_cc1 -std=c++20 -fmodules -fmodules-cache-path=%t -fmodule-map-file=%t/module.modulemap %t/use.cpp -emit-llvm -o - -triple x86_64-linux-gnu | FileCheck %s

//--- module.modulemap
module a { header "a.h" export * }
module b { header "b.h" export * }
module c { header "c.h" export * }


//--- a.h

void not_constant();
template<typename T> struct A {
  template<T M> static inline T N = [] { not_constant(); return M; } ();
};

//--- b.h
#include "a.h"

int b() { return A<int>::N<1>; }

//--- c.h
#include "a.h"

int c() { return A<int>::N<2>; }

//--- use.cpp
#include "b.h"
#include "c.h"

int bv = b();
int cv = c();

// We should not merge the two lambdas together, even though they will both
// have anonymous declaration number #1 within A<int>.

// CHECK: define {{.*}}global_var_init{{.*}} comdat($_ZN1AIiE1NILi1EEE) {
// CHECK: load i8, ptr @_ZGVN1AIiE1NILi1EEE, align 8
// CHECK: call {{.*} i32 @_ZNK1AIiE1NILi1EEMUlvE_clEv(
// CHECK: store i32 %call, ptr @_ZN1AIiE1NILi1EEE

// CHECK: define {{.*}}global_var_init{{.*}} comdat($_ZN1AIiE1NILi2EEE) {
// CHECK: load i8, ptr @_ZGVN1AIiE1NILi2EEE, align 8
// CHECK: call {{.*} i32 @_ZNK1AIiE1NILi2EEMUlvE_clEv(
// CHECK: store i32 %call, ptr @_ZN1AIiE1NILi2EEE

This test currently fails, because we generate calls to the A<int>::N<1> lambda to initialize both A<int>::N<1> and A<int>::N<2>. That happens because those two different lambdas get merged, because they're both numbered #1 within A<int> in the b and c module respectively.

We could perhaps fix the immediate issue by taking the context decl and mangling number into account when computing whether two lambdas should be merged, instead of numbering them inside their lexical context, at least when the context decl is a variable template specialization.

The same thing presumably happens when the variable template is at global scope instead of in a class template. That case seems especially hard to deal with, as there might be a local definition of the variable template in addition to ones from modules, and we don't have a good way to even find the lambdas that might have been instantiated locally. And the same thing presumably also happens for inline variables:

// header module 1
inline auto a = [] { return 1; };
// header module 2
inline auto b = [] { return 2; }
inline auto a = [] { return 1; }

... probably doesn't merge together the two a lambdas currently, and might merge the a from module 1 with the b from module 2 if we end up using lexical numbering here.

Perhaps we should address this by creating a lexical DeclContext corresponding to a variable, at least if we find that it's the context of a lambda. We would need this for instantiated variable template specializations and for inline variables, but having it in general seems useful.

@zygoloid zygoloid added bug Indicates an unexpected problem or unintended behavior c++ c++20 clang:modules C++20 modules and Clang Header Modules miscompilation labels Feb 25, 2023
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2023

@llvm/issue-subscribers-c-20

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2023

@llvm/issue-subscribers-c-1

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2023

@llvm/issue-subscribers-bug

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2023

@llvm/issue-subscribers-clang-modules

@EugeneZelenko EugeneZelenko removed bug Indicates an unexpected problem or unintended behavior c++ labels Feb 25, 2023
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2023

@llvm/issue-subscribers-c-11

@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2023

@llvm/issue-subscribers-c-20

@zygoloid zygoloid self-assigned this Mar 10, 2023
@zygoloid
Copy link
Collaborator Author

cor3ntin pushed a commit to cor3ntin/llvm-project that referenced this issue May 25, 2023
Previously, distinct lambdas would get merged, and multiple definitions
of the same lambda would not get merged, because we attempted to
identify lambdas by their ordinal position within their lexical
DeclContext. This failed for lambdas within namespace-scope variables
and within variable templates, where the lexical position in the context
containing the variable didn't uniquely identify the lambda.

In this patch, we instead identify lambda closure types by index within
their context declaration, which does uniquely identify them in a way
that's consistent across modules.

This change causes a deserialization cycle between the type of a
variable with deduced type and a lambda appearing as the initializer of
the variable -- reading the variable's type requires reading and merging
the lambda, and reading the lambda requires reading and merging the
variable. This is addressed by deferring loading the deduced type of a
variable until after we finish recursive deserialization.

This also exposes a pre-existing subtle issue where loading a
variable declaration would trigger immediate loading of its initializer,
which could recursively refer back to properties of the variable. This
particularly causes problems if the initializer contains a
lambda-expression, but can be problematic in general. That is addressed
by switching to lazily loading the initializers of variables rather than
always loading them with the variable declaration. As well as fixing a
deserialization cycle, that should improve laziness of deserialization
in general.

LambdaDefinitionData had 63 spare bits in it, presumably caused by an
off-by-one-error in some previous change. This change claims 32 of those bits
as a counter for the lambda within its context. We could probably move the
numbering to separate storage, like we do for the device-side mangling number,
to optimize the likely-common case where all three numbers (host-side mangling
number, device-side mangling number, and index within the context declaration)
are zero, but that's not done in this change.

Fixes llvm#60985.

Differential Revision: https://reviews.llvm.org/D145737
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++11 c++20 clang:modules C++20 modules and Clang Header Modules miscompilation
Projects
Status: Done
Development

No branches or pull requests

3 participants