-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[X86] Implement MMX intrinsics with SSE equivalents #41665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If we switch to widening 64-bit vectors by default, instead of promoting them, the conversions would be free, so it wouldn't really matter. Otherwise, yes, this could get messy; we might need special "fake-MMX" intrinsics. |
Being implemented with:
|
This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. Support for the underlying LLVM intrinsics remains, for the moment. They will be removed in a future patch. (Originally uploaded in https://reviews.llvm.org/D94213) Works towards issue llvm#41665.
This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. Works towards issue llvm#41665.
of MMX instructions. The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. (Note that targeting these older CPUs remains supported, simply without the ability to use MMX compiler intrinsics.) Migrating away from the use of MMX also fixes a rather non-obvious requirement for users of the intrinsics API. The long-standing programming model for MMX requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call _mm_empty() between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsics, and causes very difficult to detect bugs. Additionally, in some circumstanes LLVM may reorder x87 and mmx operations around each-other, unaware of this mode switching issue. So, even inserting _mm_empty() calls appropriately will not always guarantee correct operation. Eliminating the use of MMX instructions fixes both these latter issues. Works towards issue llvm#41665.
This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. (Clang half originally uploaded in https://reviews.llvm.org/D94213) Works towards issue #41665 and issue #98272.
… of MMX. (#96540) The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics. Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call `_mm_empty()` between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs. Worse, even if the user did write code that correctly calls `_mm_empty()` in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue. Eliminating the use of MMX registers eliminates this problem. This change also deletes the now-unnecessary MMX `__builtin_ia32_*` functions from Clang. Only 3 MMX-related builtins remain in use -- `__builtin_ia32_emms`, used by `_mm_empty`, and `__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and `_mm_extract_pi16`. Note particularly that the latter two lower to generic, non-MMX, IR. Support for the LLVM intrinsics underlying these removed builtins still remains, for the moment. The file `clang/www/builtins.py` has been updated with mappings from the newly-removed `__builtin_ia32` functions to the still-supported equivalents in `mmintrin.h`. (Originally uploaded at https://reviews.llvm.org/D86855 and https://reviews.llvm.org/D94252) Fixes issue #41665 Works towards #98272
This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. (Clang half originally uploaded in https://reviews.llvm.org/D94213) Works towards issue #41665 and issue #98272.
… of MMX. (#96540) Summary: The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics. Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call `_mm_empty()` between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs. Worse, even if the user did write code that correctly calls `_mm_empty()` in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue. Eliminating the use of MMX registers eliminates this problem. This change also deletes the now-unnecessary MMX `__builtin_ia32_*` functions from Clang. Only 3 MMX-related builtins remain in use -- `__builtin_ia32_emms`, used by `_mm_empty`, and `__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and `_mm_extract_pi16`. Note particularly that the latter two lower to generic, non-MMX, IR. Support for the LLVM intrinsics underlying these removed builtins still remains, for the moment. The file `clang/www/builtins.py` has been updated with mappings from the newly-removed `__builtin_ia32` functions to the still-supported equivalents in `mmintrin.h`. (Originally uploaded at https://reviews.llvm.org/D86855 and https://reviews.llvm.org/D94252) Fixes issue #41665 Works towards #98272 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250580
Fixed via the above PRS; MMX intrinsics now use SSE2. |
Extended Description
Similar to what's been proposed recently for gcc, we should investigate promoting MMX intrinsics to SSE equivalents:
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00061.html
This probably would be best handled in CGBuiltin.cpp - replacing the MMX builtins with SSE equivalents, although some can probably done in headers as well behind a suitable define.
NOTE: This will cause a high number of subvector insertions/extractions, we might need some mechanism to reduce this even without optimizations.
The text was updated successfully, but these errors were encountered: