[X86] Implement MMX intrinsics with SSE equivalents #41665

RKSimon · 2019-06-19T10:52:12Z


Bugzilla Link	42320
Version	trunk
OS	Windows NT
CC	@topperc,@efriedma-quic,@jyknight,@RKSimon,@rotateright

Extended Description

Similar to what's been proposed recently for gcc, we should investigate promoting MMX intrinsics to SSE equivalents:

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00061.html

This probably would be best handled in CGBuiltin.cpp - replacing the MMX builtins with SSE equivalents, although some can probably done in headers as well behind a suitable define.

NOTE: This will cause a high number of subvector insertions/extractions, we might need some mechanism to reduce this even without optimizations.

efriedma-quic · 2019-06-19T16:04:54Z

NOTE: This will cause a high number of subvector insertions/extractions, we
might need some mechanism to reduce this even without optimizations.

If we switch to widening 64-bit vectors by default, instead of promoting them, the conversions would be free, so it wouldn't really matter. Otherwise, yes, this could get messy; we might need special "fake-MMX" intrinsics.

jyknight · 2021-01-09T19:04:41Z

Being implemented with:
- https://reviews.llvm.org/D86855:
Convert __m64 intrinsics to unconditionally use SSE2 instead of MMX instructions

https://reviews.llvm.org/D94213:
Clang: Remove support for 3DNow!, both intrinsics and builtins.
- https://reviews.llvm.org/D94252:
Delete (most) of the MMX builtin functions from Clang.

This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. Support for the underlying LLVM intrinsics remains, for the moment. They will be removed in a future patch. (Originally uploaded in https://reviews.llvm.org/D94213) Works towards issue llvm#41665.

This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. Works towards issue llvm#41665.

of MMX instructions. The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. (Note that targeting these older CPUs remains supported, simply without the ability to use MMX compiler intrinsics.) Migrating away from the use of MMX also fixes a rather non-obvious requirement for users of the intrinsics API. The long-standing programming model for MMX requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call _mm_empty() between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsics, and causes very difficult to detect bugs. Additionally, in some circumstanes LLVM may reorder x87 and mmx operations around each-other, unaware of this mode switching issue. So, even inserting _mm_empty() calls appropriately will not always guarantee correct operation. Eliminating the use of MMX instructions fixes both these latter issues. Works towards issue llvm#41665.

This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. (Clang half originally uploaded in https://reviews.llvm.org/D94213) Works towards issue #41665 and issue #98272.

… of MMX. (#96540) The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics. Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call `_mm_empty()` between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs. Worse, even if the user did write code that correctly calls `_mm_empty()` in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue. Eliminating the use of MMX registers eliminates this problem. This change also deletes the now-unnecessary MMX `__builtin_ia32_*` functions from Clang. Only 3 MMX-related builtins remain in use -- `__builtin_ia32_emms`, used by `_mm_empty`, and `__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and `_mm_extract_pi16`. Note particularly that the latter two lower to generic, non-MMX, IR. Support for the LLVM intrinsics underlying these removed builtins still remains, for the moment. The file `clang/www/builtins.py` has been updated with mappings from the newly-removed `__builtin_ia32` functions to the still-supported equivalents in `mmintrin.h`. (Originally uploaded at https://reviews.llvm.org/D86855 and https://reviews.llvm.org/D94252) Fixes issue #41665 Works towards #98272

This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. (Clang half originally uploaded in https://reviews.llvm.org/D94213) Works towards issue #41665 and issue #98272.

… of MMX. (#96540) Summary: The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics. Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call `_mm_empty()` between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs. Worse, even if the user did write code that correctly calls `_mm_empty()` in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue. Eliminating the use of MMX registers eliminates this problem. This change also deletes the now-unnecessary MMX `__builtin_ia32_*` functions from Clang. Only 3 MMX-related builtins remain in use -- `__builtin_ia32_emms`, used by `_mm_empty`, and `__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and `_mm_extract_pi16`. Note particularly that the latter two lower to generic, non-MMX, IR. Support for the LLVM intrinsics underlying these removed builtins still remains, for the moment. The file `clang/www/builtins.py` has been updated with mappings from the newly-removed `__builtin_ia32` functions to the still-supported equivalents in `mmintrin.h`. (Originally uploaded at https://reviews.llvm.org/D86855 and https://reviews.llvm.org/D94252) Fixes issue #41665 Works towards #98272 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250580

jyknight · 2024-07-25T20:04:40Z

Fixed via the above PRS; MMX intrinsics now use SSE2.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

RKSimon mentioned this issue Apr 8, 2022

[X86] some builtins generate incorrect code for shifts with large (constant) shift counts #43267

Closed

jyknight mentioned this issue Jun 20, 2024

Remove support for 3DNow!, both intrinsics and builtins. #96246

Merged

jyknight mentioned this issue Jun 24, 2024

Clang: convert __m64 intrinsics to unconditionally use SSE2 instead of MMX. #96540

Merged

jyknight mentioned this issue Jul 10, 2024

X86: Delete MMX types/intrinsics from LLVM IR/backends #98272

Open

6 tasks

jyknight closed this as completed Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86] Implement MMX intrinsics with SSE equivalents #41665

[X86] Implement MMX intrinsics with SSE equivalents #41665

RKSimon commented Jun 19, 2019

efriedma-quic commented Jun 19, 2019

jyknight commented Jan 9, 2021

jyknight commented Jul 25, 2024

[X86] Implement MMX intrinsics with SSE equivalents #41665

[X86] Implement MMX intrinsics with SSE equivalents #41665

Comments

RKSimon commented Jun 19, 2019

Extended Description

efriedma-quic commented Jun 19, 2019

jyknight commented Jan 9, 2021

jyknight commented Jul 25, 2024