Skip to content

wrong code at -O1/-O2 on x86_64-linux-gnu #95630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tangyixuan01 opened this issue Jun 15, 2024 · 11 comments
Closed

wrong code at -O1/-O2 on x86_64-linux-gnu #95630

tangyixuan01 opened this issue Jun 15, 2024 · 11 comments
Labels

Comments

@tangyixuan01
Copy link

tangyixuan01 commented Jun 15, 2024

Compiler explorer: https://godbolt.org/z/PP5eaEccb

Clang-18 produces the wrong code when compiling the following code with "-O1" or "-O2", while outputting the correct result with the optimization flag "-O3".

int printf(const char *, ...);
long a;
int b, c, e, g, i;
long *d, *h;
char f = 6;
int main() {
  long j;
  c = 0;
  for (; c != 7; ++c) {
    long k;
    long **l = &d;
    for (; f + i; i++)
      h = &k;
    g = h != (*l = &j);
    int *m = &b;
    *m = g;
    for (; e; a = a + 1)
      ;
  }
  printf("%d\n", b);
}

clang -O2 s.c; ./a.out
0

clang -O3 s.c; ./a.out
1

gcc -O2 s.c; ./a.out
1

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 15, 2024

gcc -O0 s.c; ./a.out hangs. Is there an infinite loop?

@EugeneZelenko
Copy link
Contributor

gcc -O0 s.c; ./a.out hangs. Is there an infinite loop?

Third loops seems to be such.

@dtcxzyw dtcxzyw added undefined behaviour invalid Resolved as invalid, i.e. not a bug labels Jun 15, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 15, 2024

I believe this code has UB as for (; e; a = a + 1); is a non-trivial infinite loop.

@dtcxzyw dtcxzyw closed this as not planned Won't fix, can't repro, duplicate, stale Jun 15, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 15, 2024

I guess you are using creduce. If you think the original case doesn't contain UB, can you share the code?

@tangyixuan01
Copy link
Author

I guess you are using creduce. If you think the original case doesn't contain UB, can you share the code?

Thanks for your reply!
Yes, I use creduce for the code reduction. I reduce it again, and get the same results after the reduction.
Compiler explorer: https://godbolt.org/z/jjxx9srM3

Please consider the following code:

int printf(const char *, ...);
int b, c, g, i;
long *d, *h;
int main() {
  long j;
  c = 0; 
  for (; c <1; ++c) {
    long k;
    long **l = &d;
    for (; i<1; i++)
      h = &k;
    g = h != (*l = &j);
    int *m = &b;
    *m = g;
  }
  printf("%d\n", b);
}

> clang -O2 s.c; ./a.out
0

> clang -O0 s.c; a.out
1

@dtcxzyw dtcxzyw added confirmed Verified by a second party and removed invalid Resolved as invalid, i.e. not a bug undefined behaviour labels Jun 15, 2024
@dtcxzyw dtcxzyw self-assigned this Jun 15, 2024
@dtcxzyw dtcxzyw reopened this Jun 15, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 15, 2024

bisected to StackColoring.
Reproducer:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@c = dso_local local_unnamed_addr global i32 0, align 4
@d = dso_local local_unnamed_addr global ptr null, align 8
@i = dso_local local_unnamed_addr global i32 0, align 4
@h = dso_local local_unnamed_addr global ptr null, align 8
@g = dso_local local_unnamed_addr global i32 0, align 4
@b = dso_local local_unnamed_addr global i32 0, align 4
@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

; Function Attrs: nofree nounwind uwtable
define dso_local noundef i32 @main() local_unnamed_addr #0 {
entry:
  %j = alloca i64, align 8
  %k = alloca i64, align 8
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %j) #3
  %h.promoted = load ptr, ptr @h, align 8, !tbaa !5
  %i.promoted11 = load i32, ptr @i, align 4, !tbaa !9
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %k) #3
  %cmp29 = icmp slt i32 %i.promoted11, 1
  br i1 %cmp29, label %for.body3.lr.ph, label %for.end

for.body3.lr.ph:                                  ; preds = %entry
  store ptr %k, ptr @h, align 8, !tbaa !5
  store i32 1, ptr @i, align 4, !tbaa !9
  br label %for.end

for.end:                                          ; preds = %for.body3.lr.ph, %entry
  %0 = phi ptr [ %k, %for.body3.lr.ph ], [ %h.promoted, %entry ]
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %k) #3
  %cmp4.le = icmp ne ptr %0, %j
  %conv.le = zext i1 %cmp4.le to i32
  store i32 1, ptr @c, align 4, !tbaa !9
  store ptr %j, ptr @d, align 8, !tbaa !5
  store i32 %conv.le, ptr @g, align 4, !tbaa !9
  store i32 %conv.le, ptr @b, align 4, !tbaa !9
  %call = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %conv.le)
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %j) #3
  ret i32 0
}

; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1

; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1

; Function Attrs: nofree nounwind
declare noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #2

attributes #0 = { nofree nounwind uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #2 = { nofree nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #3 = { nounwind }

!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{!"clang version 19.0.0git"}
!5 = !{!6, !6, i64 0}
!6 = !{!"any pointer", !7, i64 0}
!7 = !{!"omnipotent char", !8, i64 0}
!8 = !{!"Simple C/C++ TBAA"}
!9 = !{!10, !10, i64 0}
!10 = !{!"int", !7, i64 0}
> lli test.ll
0
> lli --force-interpreter test.ll
1
# End machine code for function main.

# Machine code for function main: IsSSA, TracksLiveness
Frame Objects:
  fi#0: size=8, align=8, at location [SP+8]
  fi#1: size=8, align=8, at location [SP+8]

0B	bb.0.entry:
	  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

16B	  LIFETIME_START %stack.0.j
32B	  %0:gr64 = MOV64rm $rip, 1, $noreg, @h, $noreg :: (dereferenceable load (s64) from @h, !tbaa !5)
48B	  CMP32mi $rip, 1, $noreg, @i, $noreg, 0, implicit-def $eflags :: (dereferenceable load (s32) from @i, !tbaa !9)
64B	  LIFETIME_START %stack.1.k
80B	  JCC_1 %bb.2, 15, implicit $eflags
96B	  JMP_1 %bb.1

112B	bb.1.for.body3.lr.ph:
	; predecessors: %bb.0
	  successors: %bb.2(0x80000000); %bb.2(100.00%)

128B	  %2:gr64 = LEA64r %stack.1.k, 1, $noreg, 0, $noreg
144B	  MOV64mr $rip, 1, $noreg, @h, $noreg, %2:gr64 :: (store (s64) into @h, !tbaa !5)
160B	  MOV32mi $rip, 1, $noreg, @i, $noreg, 1 :: (store (s32) into @i, !tbaa !9)

176B	bb.2.for.end:
	; predecessors: %bb.0, %bb.1

192B	  %1:gr64 = PHI %0:gr64, %bb.0, %2:gr64, %bb.1
208B	  LIFETIME_END %stack.1.k
224B	  %3:gr64 = LEA64r %stack.0.j, 1, $noreg, 0, $noreg
240B	  %4:gr64 = SUB64rr %1:gr64(tied-def 0), %3:gr64, implicit-def $eflags
256B	  %5:gr8 = SETCCr 5, implicit $eflags
272B	  %6:gr32 = MOVZX32rr8 killed %5:gr8
288B	  MOV32mi $rip, 1, $noreg, @c, $noreg, 1 :: (store (s32) into @c, !tbaa !9)
304B	  MOV64mr $rip, 1, $noreg, @d, $noreg, %3:gr64 :: (store (s64) into @d, !tbaa !5)
320B	  MOV32mr $rip, 1, $noreg, @g, $noreg, %6:gr32 :: (store (s32) into @g, !tbaa !9)
336B	  MOV32mr $rip, 1, $noreg, @b, $noreg, %6:gr32 :: (store (s32) into @b, !tbaa !9)
352B	  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
368B	  %7:gr64 = MOV32ri64 @.str
384B	  %8:gr32 = MOV32r0 implicit-def dead $eflags
400B	  %9:gr8 = COPY %8.sub_8bit:gr32
416B	  $rdi = COPY %7:gr64
432B	  $esi = COPY %6:gr32
448B	  $al = COPY %9:gr8
464B	  CALL64pcrel32 target-flags(x86-plt) @printf, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $esi, implicit $al, implicit-def $rsp, implicit-def $ssp, implicit-def $eax
480B	  ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
496B	  %10:gr32 = COPY $eax
512B	  LIFETIME_END %stack.0.j
528B	  $eax = COPY %8:gr32
544B	  RET 0, $eax

# End machine code for function main.

********** Stack Coloring **********
********** Function: main
Found a lifetime start marker for slot #0 with allocation: j
Found a lifetime start marker for slot #1 with allocation: k
Found a lifetime end marker for slot #1 with allocation: k
Found a lifetime end marker for slot #0 with allocation: j
Conservative slots : { 0 0 }
Found a use of slot #1 at %bb.1 index 128B with allocation: k
Found a use of slot #0 at %bb.2 index 224B with allocation: j
Found 4 markers and 2 slots
Slot structure:
Slot #0 - 8 bytes.
Slot #1 - 8 bytes.
Total Stack size: 16 bytes

Dataflow iterations: 2
Inspecting block #0 [entry]
BEGIN : { 0 0 }
END : { 0 0 }
LIVE_IN : { }
LIVE_OUT : { }
Inspecting block #1 [for.body3.lr.ph]
BEGIN : { 0 1 }
END : { 0 0 }
LIVE_IN : { }
LIVE_OUT : { 0 1 }
Inspecting block #2 [for.end]
BEGIN : { 0 0 }
END : { 1 1 }
LIVE_IN : { 0 1 }
LIVE_OUT : { }
Interval[0]:
$noreg [224B,512B:0) 0@0B-phi  weight:0.000000e+00
Interval[1]:
$physreg1 [128B,208B:0) 0@0B-phi  weight:0.000000e+00
Merging #0 and slots #1 together.
Merge 1 slots. Saved 8 bytes
Fixed 0 machine memory operands.
Fixed 0 debug locations.
Fixed 1 machine instructions.
Removed 4 markers.
DeadMachineInstructionElim: DELETING: %10:gr32 = COPY $eax

cc @RKSimon @topperc

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 15, 2024

 %0 = phi ptr [ %k, %for.body3.lr.ph ], [ %h.promoted, %entry ]
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %k) #3
  %cmp4.le = icmp ne ptr %0, %j

As stack slots %j and %k are merged by StackColoring, %cmp4.le evaluates to false :(

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 15, 2024

This issue is fixed after I swapped @llvm.lifetime.end.p0 marker and %cmp4.le = icmp ne ptr %0, %j. I will post a fix later.

@nikic
Copy link
Contributor

nikic commented Jun 15, 2024

This sounds like the same issue as #45725, which is hard to fix.

I assume this is fuzzer-generated?

@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 15, 2024

I assume this is fuzzer-generated?

I guess so.

@dtcxzyw dtcxzyw removed their assignment Jun 25, 2024
@nikic
Copy link
Contributor

nikic commented Jul 3, 2024

Closing this as a duplicate of the issue mentioned above.

@nikic nikic closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2024
@RKSimon RKSimon added the duplicate Resolved as duplicate label Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants