Skip to content

[MMX] POSTRAScheduler disarrange emms and mmx instruction #35330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
itaraban opened this issue Jan 17, 2018 · 11 comments
Open

[MMX] POSTRAScheduler disarrange emms and mmx instruction #35330

itaraban opened this issue Jan 17, 2018 · 11 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@itaraban
Copy link

itaraban commented Jan 17, 2018

Bugzilla Link 35982
Version trunk
OS Linux
Blocks #41664
CC @topperc,@efriedma-quic,@gnzlbg,@jyknight,@RKSimon,@zygoloid,@tstellar

Extended Description

POSTRAScheduler rearrange emms and mmx instruction, so we receive wrong result:

================= main.c ==============
#include <stdio.h>
#include <x86intrin.h>
float sum(__m64);
int main()
{
    float result;
    __m64 x = (_mm_set_pi32(5, 3));
    result = sum(x);
    printf("5 + 3 = %f\n", result);
    _mm_empty();

    return 0;
}

=======================================
================= nice.c ==============
#include <x86intrin.h>

float sum(__m64 x)
{
    double t;
    int x1, x2;

    x1 = _mm_cvtsi64_si32(x);
    x2 = _mm_cvtsi64_si32(_mm_unpackhi_pi32(x, x));

    _mm_empty();

    t = (float)x1 + (float)x2;

    return t;
}
=======================================

>>> clang -v
clang version 7.0.0 (trunk 322555)
Target: x86_64-unknown-linux-gnu
Thread model: posix
...


>>> clang -m32 -O0 -o nice.exe main.c nice.c
>>> ./nice.exe
5 + 3 = 8.000000

>>> clang -m32 -O2 -o nice.exe main.c nice.c
>>> ./nice.exe
5 + 3 = -nan

>>> clang -m32 -O2 -o nice.exe main.c nice.c -mllvm -opt-bisect-limit=194 && ./nice.exe
...
BISECT: running pass (193) Tail Duplication on function (sum)
BISECT: running pass (194) Machine Copy Propagation Pass on function (sum)
BISECT: NOT running pass (195) Post RA top-down list latency scheduler on function (sum)
BISECT: NOT running pass (196) Branch Probability Basic Block Placement on function (sum)
...
5 + 3 = 8

>>> clang -m32 -O2 -o nice.exe main.c nice.c -mllvm -opt-bisect-limit=195 && ./nice.exe
...
BISECT: running pass (193) Tail Duplication on function (sum)
BISECT: running pass (194) Machine Copy Propagation Pass on function (sum)
BISECT: running pass (195) Post RA top-down list latency scheduler on function (sum)
BISECT: NOT running pass (196) Branch Probability Basic Block Placement on function (sum)
BISECT: NOT running pass (197) X86 Execution Dependency Fix on function (sum)
...
5 + 3 = -nan

Let's look at ASM before and after POSTRAScheduler:
=============== nice-194.s ============
	cvtsi2ssl	%eax, %xmm0
	movq	8(%esp), %mm0
	emms
	punpckhdq	%mm0, %mm0      # mm0 = mm0[1,1]
	movd	%mm0, %ecx
	cvtsi2ssl	%ecx, %xmm1
=======================================
    
POSTRAScheduler changed order of operations to:
 
=============== nice-195.s ============
	movq	8(%esp), %mm0
	punpckhdq	%mm0, %mm0      # mm0 = mm0[1,1]
	movd	%mm0, %ecx
	emms
	cvtsi2ssl	%eax, %xmm0
	cvtsi2ssl	%ecx, %xmm1
=======================================
    
So now emms is placed before mmx operation, and as a result, we receive wrong answer.
@itaraban
Copy link
Author

itaraban commented Jan 17, 2018

Oops, I mixed up nice-194.s and nice-195.s.

So it should be:

=============== nice-194.s ============
	movq	8(%esp), %mm0
	punpckhdq	%mm0, %mm0      # mm0 = mm0[1,1]
	movd	%mm0, %ecx
	emms
	cvtsi2ssl	%eax, %xmm0
	cvtsi2ssl	%ecx, %xmm1
=======================================
=============== nice-195.s ============
	cvtsi2ssl	%eax, %xmm0
	movq	8(%esp), %mm0
	emms
	punpckhdq	%mm0, %mm0      # mm0 = mm0[1,1]
	movd	%mm0, %ecx
	cvtsi2ssl	%ecx, %xmm1
=======================================

@topperc
Copy link
Collaborator

topperc commented Jan 18, 2018

Simon, do we just need to mark EMMS/FEMMS as defing the MMX registers? That seems to at least fix this case. Maybe mark it as using them too?

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 18, 2018

Simon, do we just need to mark EMMS/FEMMS as defing the MMX registers? That
seems to at least fix this case. Maybe mark it as using them too?

Yes, and any x87 registers as well. I was considering just making them terminators to prevent any crossover - even though that'd affect all instructions not just x87/mmx.

@efriedma-quic
Copy link
Collaborator

Simon, do we just need to mark EMMS/FEMMS as defing the MMX registers? That
seems to at least fix this case. Maybe mark it as using them too?

I don't think either of those is sufficient, given the definitions would be dead (so you're effectively just clobbering the registers).

The x86-64 ABI says "The CPU shall be in x87 mode upon entry to a function." i.e. the x87 tag word should be set to all ones. So the right way to handle this is to explicitly model the tag word as a register: MMX instrutions clobber it, emms defines it, and call/return instructions use it.

There are also other potential "scheduling" problems: currently, MMX intrinsics are marked IntrNoMem, so an IR transform could sink an MMX instruction past an EMMS. But that's sort of orthogonal to the MachineInstr modeling.

@gnzlbg
Copy link
Mannequin

gnzlbg mannequin commented Jan 21, 2019

Is this a duplicate of #15760 ? It does looks suspiciously similar.

I believe this bug causes undefined behavior in Rust code using MMX registers. We have had a lot of recurring bugs in linux, windows, and macos targets (mostly on 32-bit targets) over the last two years where suddenly some floating-point tests would intermittently fail in some systems and a couple of compiler builds later the failures would disappear just to reappear some time later.

We have started to manually emit emms using inline assembly and that has fixed some of these intermittent issues for good, but this is probably something worth fixing at the LLVM level.

Frontends using the floating-point LLVM intrinsics should have to use inline assembly to avoid undefined behavior.

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 27, 2019

@topperc
Copy link
Collaborator

topperc commented Jan 30, 2019

A partial fix to clobber MM0-7 and ST0-7 on emms/femms was commited in r352642. This prevents the postRA scheduler from affecting the test case here.

This is not a complete fix and it can break in other ways.

@topperc
Copy link
Collaborator

topperc commented Feb 4, 2019

The fix committed in r352642 was reverted in r352782. It was recommitted in r353016. Hopefully it will stick this time.

@jyknight
Copy link
Member

jyknight commented Aug 30, 2020

Richard Smith came up with this test (i've slightly modified it for clarity). Compile for 64bit, -O2 -- https://godbolt.org/z/drn6zs.

#include <mmintrin.h>
void g(__m64);

long long f() {
  volatile long long a = {0}, b = {0};
// x87 usage
  volatile long double d = 1.0;
  long long aload = a, bload = b;
  double dload = d;
  bool s = __builtin_signbitl(dload);
// MMX usage
  __m64 c = _m_punpckhbw (_mm_cvtsi64_m64(aload), _mm_cvtsi64_m64(bload));
  if (s) g(c);
  long long result = _mm_cvtm64_si64(c);
  _mm_empty();
// MMX cleared
  return result;
}
_Z1fv: # @_Z1fv
  subq $72, %rsp
  movq $0, 24(%rsp)
  movq $0, 16(%rsp)
;; X87 instructions
  fld1
  fstpt 48(%rsp)
;; MMX instructions
  movq 24(%rsp), %mm0
  punpckhbw 16(%rsp), %mm0 # mm0 = mm0[4],mem[4],mm0[5],mem[5],mm0[6],mem[6],mm0[7],mem[7]
;; BROKEN: X87 instructions again...
  fldt 48(%rsp)
  fstpt 32(%rsp)
  movswq 40(%rsp), %rax
  testq %rax, %rax
  jns .LBB0_2
;; MMX instructions again...
  movq2dq %mm0, %xmm0
  movq %mm0, 8(%rsp) # 8-byte Spill
  callq _Z1gDv1_x
  movq 8(%rsp), %mm0 # 8-byte Reload
.LBB0_2:
  movq %mm0, %rax
;; Clear MMX state, safe to use x87 again.
  emms
  addq $72, %rsp
  retq

@topperc
Copy link
Collaborator

topperc commented Aug 30, 2020

James, this most recent issue isn't unique to llvm is it? I think gcc had the same issue before they switched to converting all mmx operations to SSE2 in recent versions of gcc.

@RKSimon
Copy link
Collaborator

RKSimon commented Nov 27, 2021

mentioned in issue #41664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

5 participants