-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[GlobalISel] Fall back for bf16 conversions. #71470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We don't support these correctly since we don't yet have FP types. AMDGPU tests were silently miscompiling bf16 as if they were fp16.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Amara Emerson (aemerson) ChangesWe don't support these correctly since we don't yet have FP types. Patch is 68.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71470.diff 7 Files Affected:
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index d8f9e30b2599779..3098c8ea468a9d1 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1484,6 +1484,9 @@ bool IRTranslator::translateBitCast(const User &U,
bool IRTranslator::translateCast(unsigned Opcode, const User &U,
MachineIRBuilder &MIRBuilder) {
+ if (U.getType()->getScalarType()->isBFloatTy() ||
+ U.getOperand(0)->getType()->getScalarType()->isBFloatTy())
+ return false;
Register Op = getOrCreateVReg(*U.getOperand(0));
Register Res = getOrCreateVReg(U);
MIRBuilder.buildInstr(Opcode, {Res}, {Op});
diff --git a/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll b/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
index e3457421a49036e..e9bf515daabca9f 100644
--- a/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
+++ b/llvm/test/CodeGen/AMDGPU/fmed3-cast-combine.ll
@@ -1,15 +1,15 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; Test no legal f16. Should just keep the cast to f32 and
; RUN: llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri < %s | FileCheck -check-prefixes=GCN,GFX7,GFX7-SDAG %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri < %s | FileCheck -check-prefixes=GCN,GFX7,GFX7-GISEL %s
+; RUN: llc -global-isel=1 -global-isel-abort=2 -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri < %s | FileCheck -check-prefixes=GCN,GFX7,GFX7-GISEL %s
; Test legal f16, no f16 fmed3. Should expand to min/max sequence
; RUN: llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s | FileCheck -check-prefixes=GCN,GFX8,GFX8-SDAG %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s | FileCheck -check-prefixes=GCN,GFX8,GFX8-GISEL %s
+; RUN: llc -global-isel=1 -global-isel-abort=2 -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s | FileCheck -check-prefixes=GCN,GFX8,GFX8-GISEL %s
; Legal f16 med3. InstCombine ought to shrink the f32 op to f16 so the codegen doesn't really matter for this.
; RUN: llc -global-isel=0 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9,GFX9-SDAG %s
-; RUN: llc -global-isel=1 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9,GFX9-GISEL %s
+; RUN: llc -global-isel=1 -global-isel-abort=2 -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9,GFX9-GISEL %s
declare float @llvm.amdgcn.fmed3.f32(float, float, float) #0
@@ -773,61 +773,32 @@ define half @fmed3_fneg_fabs_f32_fpext_f16(half %arg0, half %arg1, half %arg2) #
; --------------------------------------------------------------------------------
define bfloat @fmed3_f32_fpext_f16_fptrunc_bf16(half %arg0, half %arg1, half %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX7-SDAG: ; %bb.0:
-; GFX7-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
-; GFX7-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX7-GISEL: ; %bb.0:
-; GFX7-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX8-SDAG: ; %bb.0:
-; GFX8-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX8-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
-; GFX8-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX8-GISEL: ; %bb.0:
-; GFX8-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT: v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT: v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX9-SDAG: ; %bb.0:
-; GFX9-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
-; GFX9-SDAG-NEXT: s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
+; GFX7: ; %bb.0:
+; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX7-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
+; GFX8: ; %bb.0:
+; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX8-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GFX8-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GFX8-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX8-NEXT: s_setpc_b64 s[30:31]
;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
-; GFX9-GISEL: ; %bb.0:
-; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT: s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_fptrunc_bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX9-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GFX9-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GFX9-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
+; GFX9-NEXT: s_setpc_b64 s[30:31]
%arg0.ext = fpext half %arg0 to float
%arg1.ext = fpext half %arg1 to float
%arg2.ext = fpext half %arg2 to float
@@ -1039,56 +1010,27 @@ define half @fmed3_f32_fpext_f16_multi_use_2(half %arg0, half %arg1, half %arg2,
}
define half @fmed3_f32_fpext_bf16(bfloat %arg0, bfloat %arg1, bfloat %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_bf16:
-; GFX7-SDAG: ; %bb.0:
-; GFX7-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_bf16:
-; GFX7-GISEL: ; %bb.0:
-; GFX7-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_bf16:
-; GFX8-SDAG: ; %bb.0:
-; GFX8-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_bf16:
-; GFX8-GISEL: ; %bb.0:
-; GFX8-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT: v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT: v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_bf16:
-; GFX9-SDAG: ; %bb.0:
-; GFX9-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT: s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_bf16:
+; GFX7: ; %bb.0:
+; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_bf16:
+; GFX8: ; %bb.0:
+; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT: s_setpc_b64 s[30:31]
;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_bf16:
-; GFX9-GISEL: ; %bb.0:
-; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT: s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_bf16:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT: s_setpc_b64 s[30:31]
%arg0.ext = fpext bfloat %arg0 to float
%arg1.ext = fpext bfloat %arg1 to float
%arg2.ext = fpext bfloat %arg2 to float
@@ -1098,60 +1040,31 @@ define half @fmed3_f32_fpext_bf16(bfloat %arg0, bfloat %arg1, bfloat %arg2) #1 {
}
define half @fmed3_f32_fpext_f16_bf16_0(bfloat %arg0, half %arg1, half %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX7-SDAG: ; %bb.0:
-; GFX7-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX7-GISEL: ; %bb.0:
-; GFX7-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX8-SDAG: ; %bb.0:
-; GFX8-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX8-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX8-GISEL: ; %bb.0:
-; GFX8-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT: v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT: v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX9-SDAG: ; %bb.0:
-; GFX9-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT: s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_bf16_0:
+; GFX7: ; %bb.0:
+; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_bf16_0:
+; GFX8: ; %bb.0:
+; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GFX8-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GFX8-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT: s_setpc_b64 s[30:31]
;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_0:
-; GFX9-GISEL: ; %bb.0:
-; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT: s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_bf16_0:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GFX9-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GFX9-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT: s_setpc_b64 s[30:31]
%arg0.ext = fpext bfloat %arg0 to float
%arg1.ext = fpext half %arg1 to float
%arg2.ext = fpext half %arg2 to float
@@ -1161,60 +1074,31 @@ define half @fmed3_f32_fpext_f16_bf16_0(bfloat %arg0, half %arg1, half %arg2) #1
}
define half @fmed3_f32_fpext_f16_bf16_1(half %arg0, bfloat %arg1, half %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX7-SDAG: ; %bb.0:
-; GFX7-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX7-GISEL: ; %bb.0:
-; GFX7-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX8-SDAG: ; %bb.0:
-; GFX8-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX8-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX8-GISEL: ; %bb.0:
-; GFX8-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT: v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT: v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX9-SDAG: ; %bb.0:
-; GFX9-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT: s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_bf16_1:
+; GFX7: ; %bb.0:
+; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_bf16_1:
+; GFX8: ; %bb.0:
+; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX8-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GFX8-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT: s_setpc_b64 s[30:31]
;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_1:
-; GFX9-GISEL: ; %bb.0:
-; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT: s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_bf16_1:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX9-NEXT: v_cvt_f32_f16_e32 v2, v2
+; GFX9-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT: s_setpc_b64 s[30:31]
%arg0.ext = fpext half %arg0 to float
%arg1.ext = fpext bfloat %arg1 to float
%arg2.ext = fpext half %arg2 to float
@@ -1224,60 +1108,31 @@ define half @fmed3_f32_fpext_f16_bf16_1(half %arg0, bfloat %arg1, half %arg2) #1
}
define half @fmed3_f32_fpext_f16_bf16_2(half %arg0, half %arg1, bfloat %arg2) #1 {
-; GFX7-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX7-SDAG: ; %bb.0:
-; GFX7-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX7-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX7-GISEL: ; %bb.0:
-; GFX7-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX7-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX7-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX7-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX7-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX8-SDAG: ; %bb.0:
-; GFX8-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX8-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX8-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX8-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX8-SDAG-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX8-GISEL: ; %bb.0:
-; GFX8-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT: v_min_f16_e32 v3, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: v_max_f16_e32 v1, v3, v2
-; GFX8-GISEL-NEXT: v_min_f16_e32 v0, v0, v1
-; GFX8-GISEL-NEXT: s_setpc_b64 s[30:31]
-;
-; GFX9-SDAG-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX9-SDAG: ; %bb.0:
-; GFX9-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-SDAG-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-SDAG-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-SDAG-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-SDAG-NEXT: s_setpc_b64 s[30:31]
+; GFX7-LABEL: fmed3_f32_fpext_f16_bf16_2:
+; GFX7: ; %bb.0:
+; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX7-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX7-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX7-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: fmed3_f32_fpext_f16_bf16_2:
+; GFX8: ; %bb.0:
+; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX8-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GFX8-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX8-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX8-NEXT: s_setpc_b64 s[30:31]
;
-; GFX9-GISEL-LABEL: fmed3_f32_fpext_f16_bf16_2:
-; GFX9-GISEL: ; %bb.0:
-; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v0, v0
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v1, v1
-; GFX9-GISEL-NEXT: v_cvt_f32_f16_e32 v2, v2
-; GFX9-GISEL-NEXT: v_med3_f32 v0, v0, v1, v2
-; GFX9-GISEL-NEXT: v_cvt_f16_f32_e32 v0, v0
-; GFX9-GISEL-NEXT: s_setpc_b64 s[30:31]
+; GFX9-LABEL: fmed3_f32_fpext_f16_bf16_2:
+; GFX9: ; %bb.0:
+; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_cvt_f32_f16_e32 v0, v0
+; GFX9-NEXT: v_cvt_f32_f16_e32 v1, v1
+; GFX9-NEXT: v_med3_f32 v0, v0, v1, v2
+; GFX9-NEXT: v_cvt_f16_f32_e32 v0, v0
+; GFX9-NEXT: s_setpc_b64 s[30:31]
%arg0.ext = fpext half %arg0 to float
%arg1.ext = fpext half %arg1 to float
%arg2.ext = fpext bfloat %arg2 to float
@@ -1488,4 +1343,3 @@ attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn me...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also hidden conversions in all the ABI contexts ideally would skip as well
Nice. I was thinking of adding this check for aarch64 at least, it makes sense to do it generically. How do we think bf16 should be supported? As far as I understand the options would be
Maybe there are other options too? There are other (multiple) fp8 types being proposed currently. Any thought so far on what the best way to support them would be? |
We were talking about adding it to LLT. I think we should steal some bits from the size or element count and turn it into an enum for the class, such that any FP type would still be isScalar but identifiable |
OK. Sounds like that should work, so long as it was still simple to keep the types agnostic where its beneficial (load and stores and shuffles and whatnot). There are multiple types of fp8 already, two that get used in Arm, maybe more that 4 in total but I'm not sure how many will need to be supported. There is still a few bits in vector LLTs that could be used currently. |
Now an error is reported for a very simple bf16 data addition. Is this modification too violent?
|
No, this is exactly the kind of case that is supposed to be rejected |
clang -O0 -march=armv8.2-a+bf16 test.c I don't really understand that at this point our hardware already supports the bf16 data type. |
GlobalISel doesn't currently support bf16 properly until we add FP types. This change is supposed to trigger an abort or fallback to SDAG so that we don't silently miscompile bf16 code as if it were FP16. |
I just tried that test case and I see the crash. The crash is in SelectionDAG because there's incomplete support for bf16 in the backend. Previously clang was silently miscompiling this to FP16 because GlobalISel didn't know it was bf16. Also, bf16 in ARM doesn't provide you native support for general purpose FP arithmetic with bf16. It only adds some MLA/dotprod and conversions IIRC, so trying to add two bf16s would have to be done by promoting to float32 and then truncating back down. |
It was previously giving an error in the frontend that you are using an operation that is not supported, but that was broken at some point without making the backend work correctly. |
Is this the kind of error? https://godbolt.org/z/v8d4foaef |
No, that error doesn't make much sense |
Yep, that's the error. Operations like Add were previously not defined as they were not supported by the instruction set and there were no promotions for them. Someone changed how the frontend worked whilst only making the backend work for X86. |
Okay. thank you. I think I see what you mean. So, are there any plans to support the follow-up?@davemgreen @arsenm |
I don't know of anyone working on it, but it would be good to get it fixed. |
We don't support these correctly since we don't yet have FP types.
AMDGPU tests were silently miscompiling bf16 as if they were fp16.