JIT: Emit mulx for GT_MULHI and GT_MUL_LONG if BMI2 is available #116198

Daniel-Svensson · 2025-06-01T19:59:13Z

Summary

Overview:

Allows the JIT to emit MULX for GT_MULHI (when doing division by constant) and for GT_MUL_LONG
- Using mulx should allow more flexible register allocation
Fixes containment check GT_MUL_LONG on x86 (it never succeeded due to different register sizes)

Minor changes:

Use helper IsUnsigned() call instead of doing bit manipulation for clarity
Place any memory operator as "op2" during lowering (containment check) for multiply to allow simpler code

Copilot Summary

Expand

Enhancements for BMI2 Instruction Set:

Added support for the MULX instruction when BMI2 is available, enabling efficient unsigned multiplication without modifying RDX. This includes logic to handle operand placement and instruction emission for both GT_MULHI and GT_MUL_LONG. (src/coreclr/jit/codegenxarch.cpp, [1] [2] [3]

Operand and Memory Containment Logic:

Updated ContainCheckMul to adjust operand types (nodeType) for GT_MUL_LONG and ensure safe containment of memory operands.
Added operand swapping to guarantee contained memory operands are always op2. (src/coreclr/jit/lowerxarch.cpp, [1] [2] [3] [4] [5] [6]

Register Allocation and Kill Set Updates:

Modified LinearScan::getKillSetForMul to avoid killing RDX when using MULX and added logic to differentiate between base instructions and BMI2-specific instructions. (src/coreclr/jit/lsrabuild.cpp, src/coreclr/jit/lsrabuild.cppL785-R798)

General Refactoring:

Simplified flag checks by replacing bitwise operations with IsUnsigned() method calls for clarity and consistency. (src/coreclr/jit/lowerxarch.cpp, [1]; src/coreclr/jit/lsraxarch.cpp, [2]
Adjusted BuildMul to account for BMI2-specific operand handling and register constraints, ensuring proper use of implicit registers like RAX and RDX. (src/coreclr/jit/lsraxarch.cpp, [1] [2]

These changes enhance the compiler's efficiency and maintainability, particularly for architectures supporting BMI2, while ensuring correctness in operand handling and memory containment.

Code generation examples

Simple

static unsafe ulong TestBigMulINT2(uint* arr, uint b)
{
    return Math.BigMul(b, arr[0]) + Math.BigMul(b, arr[1]);
}

With BMI2


; Method Program:<<Main>$>g__TestBigMulINT2|0_15(uint,uint):ulong (FullOpts)
G_M34028_IG01:  ;; offset=0x0000
       push     esi
       sub      esp, 16
						;; size=4 bbWeight=1 PerfScore 1.25

G_M34028_IG02:  ;; offset=0x0004
       mulx     eax, esi, dword ptr [ecx]
       mulx     edx, ecx, dword ptr [ecx+0x04]
       add      eax, edx
       mov      edx, esi
       adc      edx, ecx
						;; size=17 bbWeight=1 PerfScore 13.00

G_M34028_IG03:  ;; offset=0x0015
       add      esp, 16
       pop      esi
       ret      
						;; size=5 bbWeight=1 PerfScore 1.75
; Total bytes of code: 26

Without BMI2


; Method Program:<<Main>$>g__TestBigMulINT2|0_15(uint,uint):ulong (FullOpts)
G_M34028_IG01:  ;; offset=0x0000
       push     edi
       push     esi
       push     ebx
       sub      esp, 16
       mov      esi, edx
						;; size=8 bbWeight=1 PerfScore 3.50

G_M34028_IG02:  ;; offset=0x0008
       mov      eax, esi
       mul      edx:eax, dword ptr [ecx]
       mov      edi, eax
       mov      ebx, edx
       mov      eax, esi
       mul      edx:eax, dword ptr [ecx+0x04]
       add      eax, edi
       adc      edx, ebx
						;; size=17 bbWeight=1 PerfScore 13.75

G_M34028_IG03:  ;; offset=0x0019
       add      esp, 16
       pop      ebx
       pop      esi
       pop      edi
       ret      
						;; size=7 bbWeight=1 PerfScore 2.75
; Total bytes of code: 32

BigMul

The following code is generated for the following Math.BigMul method

static ulong BigMul(ulong a, uint b, out ulong low)
{
    ulong prodL = ((ulong)(uint)a) * b;
    ulong prodH = (prodL >> 32) + (((ulong)(uint)(a >> 32)) * b);

    low = ((prodH << 32) | (uint)prodL);
    return (prodH >> 32);
}

codegen with BMI2

bmi2 codegen produces less push/pop due to better argument usage

; Method Program:<<Main>$>g__BigMul|0_14(ulong,uint,byref):ulong (FullOpts)
G_M22501_IG01:  ;; offset=0x0000
       push     esi
       sub      esp, 24
       mov      bword ptr [esp], edx
       mov      eax, ecx
						;; size=9 bbWeight=1 PerfScore 2.50

G_M22501_IG02:  ;; offset=0x0009
       mov      edx, eax
       mulx     esi, ecx, dword ptr [esp+0x20]
       mov      edx, eax
       mulx     edx, eax, dword ptr [esp+0x24]
       add      eax, esi
       adc      edx, 0
       mov      dword ptr [esp+0x0C], edx
       mov      esi, bword ptr [esp]
       mov      dword ptr [esi], ecx
       mov      dword ptr [esi+0x04], eax
       xor      edx, edx
       mov      eax, dword ptr [esp+0x0C]
						;; size=41 bbWeight=1 PerfScore 16.50

G_M22501_IG03:  ;; offset=0x0032
       add      esp, 24
       pop      esi
       ret      8
						;; size=7 bbWeight=1 PerfScore 2.75
; Total bytes of code: 57

codegen without BMI2

; Method Program:<<Main>$>g__BigMul|0_12(ulong,uint,byref):ulong (FullOpts)
G_M59235_IG01:  ;; offset=0x0000
       push     edi
       push     esi
       push     ebx
       sub      esp, 20
       mov      esi, edx
						;; size=8 bbWeight=1 PerfScore 3.50

G_M59235_IG02:  ;; offset=0x0008
       mov      eax, ecx
       mul      edx:eax, dword ptr [esp+0x24]
       mov      edi, eax
       mov      ebx, edx
       mov      eax, ecx
       mul      edx:eax, dword ptr [esp+0x28]
       add      eax, ebx
       adc      edx, 0
       mov      dword ptr [esp+0x08], edx
       mov      dword ptr [esi], edi
       mov      dword ptr [esi+0x04], eax
       xor      edx, edx
       mov      eax, dword ptr [esp+0x08]
						;; size=36 bbWeight=1 PerfScore 16.00

G_M59235_IG03:  ;; offset=0x002C
       add      esp, 20
       pop      ebx
       pop      esi
       pop      edi
       ret      8
						;; size=9 bbWeight=1 PerfScore 3.75
; Total bytes of code: 53

.NET 9 (from linqpad)

<html>
<body>
<!--StartFragment-->
0000 | push | edi
-- | -- | --
L0001 | push | esi
L0002 | push | ebx
L0003 | sub | esp, 0x14
L0006 | mov | esi, edx

L0008 | mov | eax, [esp+0x24]
L000c | mul | ecx
L000e | mov | edi, eax
L0010 | mov | ebx, edx
L0012 | mov | eax, [esp+0x28]
L0016 | mul | ecx
L0018 | add | eax, ebx
L001a | adc | edx, 0
L001d | mov | [esp+8], edx
L0021 | or | edi, 0
L0024 | or | eax, 0
L0027 | mov | [esi], edi
L0029 | mov | [esi+4], eax
L002c | xor | edx, edx
L002e | mov | eax, [esp+8]

L0032 | add | esp, 0x14
L0035 | pop | ebx
L0036 | pop | esi
L0037 | pop | edi
L0038 | ret | 8

<!--EndFragment-->
</body>
</html>

Division by constant

uint TEstDiv2(uint a)
{
    return a / 10;
}

With BMI2 generates the following (x86, same behaviour for ulong on x64):

; Method Program:<<Main>$>g__TEstDiv2|0_14(uint):uint (FullOpts)
G_M15534_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M15534_IG02:  ;; offset=0x0000
       mov      edx, 0xCCCCCCCD
       mulx     eax, eax, ecx
       shr      eax, 3
						;; size=13 bbWeight=1 PerfScore 3.75

G_M15534_IG03:  ;; offset=0x000D
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 14

Instead of

; Method Program:<<Main>$>g__TEstDiv2|0_14(uint):uint (FullOpts)
G_M15534_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M15534_IG02:  ;; offset=0x0000
       mov      edx, 0xCCCCCCCD
       mov      eax, ecx
       mul      edx:eax, edx
       mov      eax, edx
       shr      eax, 3
						;; size=14 bbWeight=1 PerfScore 4.25

G_M15534_IG03:  ;; offset=0x000E
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 15

* Fix register for mulx * Cleanup: use GenTree::IsUnsigned helper

* don't force op1 to implicit register if op2 is already in it

src/coreclr/jit/codegenxarch.cpp

Daniel-Svensson · 2025-06-01T20:36:47Z

src/coreclr/jit/lsraxarch.cpp

+        // In lowering, we place any memory operand in op2 so we default to placing op1 in RDX
+        // By selecting RDX here we don't have to kill it
+        srcCount = BuildOperandUses(op1, SRBM_RDX);
+        srcCount += BuildOperandUses(op2, RBM_NONE);


This code is heavily inspired by how MultiplyNoFlags is implemented.

Is it safe to not have RDX killed if SRBM_RDX is specified as register here ?

I hope this is able to produce slightly better code than to always kill RDX and just specify any register instead. (since rdx can be reused)

I just found #10196
and the code comment in

runtime/src/coreclr/jit/lsrabuild.cpp

Lines 956 to 971 in 9703a4c

regMaskTP LinearScan::getKillSetForHWIntrinsic(GenTreeHWIntrinsic* node)

{

regMaskTP killMask = RBM_NONE;

#ifdef TARGET_XARCH

switch (node->GetHWIntrinsicId())

{

case NI_X86Base_MaskMove:

// maskmovdqu uses edi as the implicit address register.

// Although it is set as the srcCandidate on the address, if there is also a fixed

// assignment for the definition of the address, resolveConflictingDefAndUse() may

// change the register assignment on the def or use of a tree temp (SDSU) when there

// is a conflict, and the FixedRef on edi won't be sufficient to ensure that another

// Interval will not be allocated there.

// Issue #17674 tracks this.

killMask = RBM_EDI;

break;

Is that still an issue? (NI_AVX2_MultiplyNoFlags does not do anything similar and still seems to work)

Daniel-Svensson · 2025-06-01T20:41:55Z

src/coreclr/jit/lsraxarch.cpp

+        case GT_MULHI:
+        {
+            // MUL, IMUL are RMW but mulx is not (which is used for unsigned operands when BMI2 is availible)
+            return !(tree->IsUnsigned() && compiler->compOpportunisticallyDependsOn(InstructionSet_BMI2));


I was thinking about extracting a helper method for determining if a multiply node should emit mulx since

tree->OperGet() != GT_MUL && isUnsignedMultiply && compiler->compOpportunisticallyDependsOn(InstructionSet_BMI2) is used in a few places, but I did not know here to place such a helper so did not do it

Would it make sense to look at using mulx when numbers are signed, but they are proven to be non-negative ?

If so where would it make sense to have a helper and do you have a suggestion for name shouldEmitMulxForMultiplication ?

src/coreclr/jit/lsraxarch.cpp

src/coreclr/jit/lowerxarch.cpp

…n rdx

* use OperIs() * replace Intructionset_BMI2 => InstructionSetAVX2

Daniel-Svensson · 2025-06-12T21:49:44Z

@JulieLeeMSFT, @jakobbotsch, it seems the dotnet-policy-service tags you on new PR's for this area, should I have mentioned you when it did not mention anyone ?

jakobbotsch · 2025-06-12T22:08:55Z

Not sure why it didn't here, let me ping the rest of the team.

cc @dotnet/jit-contrib

The diffs for the PR look quite mixed. Is it expected?

src/coreclr/jit/lowerxarch.cpp

Daniel-Svensson · 2025-06-13T06:02:59Z

@jakobbotsch

The diffs for the PR look quite mixed. Is it expected?

I cannot access the diffs so hard to tell.
Would you be able to post a screenshot?

Some observations from diffs when I did the work:

code size diffs will depend much on register pressure
- mul is so much smaller that you must remove 2 moves to decrease size
- real gains (code and perf) comes when stack spills/memory read/writes are avoided.
- i expect some size regression, but not performance regression (unless caused by code alignment)
registers assignment can become very different, causing diffs (at least large textual diffa) in itself
- even with more registers usable, I believe I had a case where it spilled memory even when there where temp registers such as r11 available. (It was many weeks ago related to my other mulx pr, but the principle would be the same)
- I do not remember if it was rax or rdx, but without mulx the variable was placed in some other register.

jakobbotsch · 2025-06-13T06:28:35Z

I cannot access the diffs so hard to tell.

What error do you get? You should be able to see the diffs just fine. E.g. I can see them even in incognito.

Daniel-Svensson · 2025-06-13T16:56:09Z

What error do you get? You should be able to see the diffs just fine. E.g. I can see them even in incognito.

Thank you for the incognito tip, I came to my microsoft account login and then was forbidden access (probably since I often use devops for other projects).

The diffs does looks worse than expected, register allocation seems to get problems when defining fixed registers for uses.
I am a bit suprised by the result for the following code (I would expected it to maybe use another temp variable) for storing

My assumption is that the below stack spill / regression is caused by not having rdx killed and using BuildOperandUses(op1, SRBM_RDX);

It seems like just killing RDX and not fixing input register might be less problematic
I will push some changes and so what happens.

static long mul2s(int a, int b)
{
    return (long)a * (long)b;
}

generates

; Method Program:<<Main>$>g__mul2|0_13(uint,uint):ulong (FullOpts)
G_M14403_IG01:  ;; offset=0x0000
       sub      esp, 12
       mov      dword ptr [esp+0x08], edx
						;; size=7 bbWeight=1 PerfScore 1.25

G_M14403_IG02:  ;; offset=0x0007
       mov      edx, ecx
       mulx     edx, eax, dword ptr [esp+0x08]
						;; size=9 bbWeight=1 PerfScore 5.25

G_M14403_IG03:  ;; offset=0x0010
       add      esp, 12
       ret      
						;; size=4 bbWeight=1 PerfScore 1.25
; Total bytes of code: 20

Full JITDUMP can be found here

…r use

…ded 1 op mul - some cleanup of BuildMul, reorder andremove dead code

Daniel-Svensson · 2025-06-14T16:47:16Z

@jakobbotsch
I switched to kill RDX instead of specifying it as fixed regsiter on use and the ~~new diffs~~ looks a lot more to what I expected.

There are mostly perf and size improvements, but there are a few regressions that I had not expected such as System.Decimal+DecCalc:DecDivMod1E9 (example diff under benchmarks.run.linux.x64.checked.mch ).

It seems a bit unintuitive to mee but it looks like it spills a variable to stack just because of different registers being allocated (it uses rax in a different way since it is not killed by mul anymore.

~~Is there any change I should do to this PR or is it something more general, perhaps it would be better to use volatile registers such as r9 to store data instead of spilling to stack?~~

  public static uint DecDivMod1E9(ref DecCalc value)
  {
      ulong high64 = ((ulong)value.uhi << 32) + value.umid;
      ulong div64 = high64 / TenToPowerNine;
      value.uhi = (uint)(div64 >> 32);
      value.umid = (uint)div64;

      ulong num = ((high64 - (uint)div64 * TenToPowerNine) << 32) + value.ulo;
      uint div = (uint)(num / TenToPowerNine);
      value.ulo = div;
      return (uint)num - div * TenToPowerNine;
  }

UPDATE:: I think I may have found it, rax seemed to be fixed register used by another operation during lovering och division by constant

Daniel-Svensson · 2025-06-15T04:35:57Z

Update: I think I fixed the rax spill issue and now the diffs looks like expected for x64.

There are a bit more regressions to x86 (even it total is an improvement) than expected, but the few a looked at looks more likely to be caused due to different register usage.

For example System.Numerics.BigIntegerCalculator:Multiply(System.ReadOnlySpan``1[uint],uint,System.Span``1[uint]) (FullOpts) which during crossgen emits mul against memory instead of temp seems to cause additional register usage and movs instead of less.
Maybe the same happens to System.Number:<NumberToBigInteger>g__MultiplyAdd|

jakobbotsch · 2025-06-16T13:59:16Z

src/coreclr/jit/lsrabuild.cpp

+        if (mulNode->IsUnsigned() && compiler->compOpportunisticallyDependsOn(InstructionSet_AVX2))
+        {
+            // If on operand is used from memory, we define fixed RDX register for use, so we don't need to kill it.
+            if (mulNode->gtGetOp1()->isUsedFromMemory() || mulNode->gtGetOp2()->isUsedFromMemory())


It's not ok to use isUsedFromMemory during LSRA, only codegen. We only know for sure after LSRA due to the spill temps case it handles.

You can check for the contained memory op case though.

I swithced to isContained() both here and in LinearScan::BuildMul

That did make a nice difference to x86, it went from a size regression to a size improvement https://dev.azure.com/dnceng-public/public/_build/results?buildId=1069175&view=ms.vss-build-web.run-extensions-tab

However I had hoped that the containment support for bigmul would make a more positive improvement.
I do wonder about the following part of the diff:
Should it not have the same perfscore? maybe that is where some of the perfscore increase is from for crossgen?

I've also noticed that containment seems to actually increase perfscore, while in reality it's most likely an improvement.

It comes down to the insThroughput considerations being different between them and so the costing returned by getInsExecutionCharacterstics isn't quite correct.

The general issue here is rather that we're effectively modeling it as a single uops when the actuality is that a contained load/embed is an additional uop on top.

So the standalone mov eax, dword ptr [ecx+0x04] is going to be throughput: 2x, latency: PERFSCORE_LATENCY_RD_* and the imul is going to be throughput: 1x, latency: 3C. While the contained is going to be throughput: 1x, latency: 3C + PERFSCORE_LATENCY_RD_*.

However, in practice the load portion is still throughput: 2x and can be pipelined since its decomposed to its own uop. It just forms part of the dependency chain directly with the subsequent instruction.

We could probably move the PERFSCORE_LATENCY_RD_* handling "up" so that way we can reasonably handle contained loads while still accurately tracking the throughput.

CC. @AndyAyersMS on if he has any alternative ideas or input.

jakobbotsch

LGTM.
@tannergooding can you take a look as well?

src/coreclr/jit/lower.cpp

src/coreclr/jit/lsraxarch.cpp

tannergooding · 2025-06-18T18:03:18Z

src/coreclr/jit/lsraxarch.cpp

    {
-        containedMemOp = op2;
+        assert(!(op1->isContained() && !op1->IsCnsIntOrI()) || !(op2->isContained() && !op2->IsCnsIntOrI()));
+        srcCount = BuildRMWUses(tree, op1, op2, RBM_NONE, RBM_NONE);


nit: This doesn't actually have to be RMW for imul reg1, reg2/mem, imm8/16/32 (which does reg1 = reg2/mem * imm)

It is only that way for imul reg1, reg2/mem (which does reg1 *= reg2/mem and imul reg1/mem (which does dx:ax = ax * reg2/mem)

I switched back to BuildBinaryUses, so it should behave exactly as before this PR for mul/imul

tannergooding

LGTM. Couple small typos and a nit about imul involving an 8, 16, or 32-bit sign-extended immediate not being RMW

Daniel-Svensson added 8 commits May 31, 2025 21:57

WIP: Emit mulx for GT_MULHI

b6fbd02

* Handle containment for GT_MUL_LONG on x86

b8d054f

* Fix register for mulx * Cleanup: use GenTree::IsUnsigned helper

update comments

4ae76cb

merge upstream/main

9f84b3f

update after merge

1266195

* remove move instruction since it is handled by lsra

8929847

* don't force op1 to implicit register if op2 is already in it

minor formatting fixes

f5d77fc

clenaup

feeb24b

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 1, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 1, 2025

Daniel-Svensson commented Jun 1, 2025

View reviewed changes

Daniel-Svensson mentioned this pull request Jun 1, 2025

Improve Math.BigMul on x64 by adding new internal Multiply hardware intrinsic to X86Base #115966

Open

Daniel-Svensson commented Jun 1, 2025

View reviewed changes

src/coreclr/jit/lsraxarch.cpp Outdated Show resolved Hide resolved

Daniel-Svensson marked this pull request as ready for review June 1, 2025 21:48

Daniel-Svensson commented Jun 2, 2025

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

Daniel-Svensson added 3 commits June 2, 2025 08:46

Ensure magic number for GT_MULHI for division with constant, is put i…

59ecc67

…n rdx

only swap operands for GT_MULHI and GT_MUL_LONG

0ff6c20

fix formatting

e56ee0a

build-analysis bot mentioned this pull request Jun 2, 2025

Timeout in HostFactoryResolverTests.NoSpecialEntryPointPatternCanRunInParallel #114704

Open

Daniel-Svensson added 2 commits June 2, 2025 17:34

Merge remote-tracking branch 'upstream/main' into x86_bmi_mulhi

ce9e87b

Fix operand order

691e442

build-analysis bot mentioned this pull request Jun 3, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

Daniel-Svensson added 2 commits June 12, 2025 22:57

merge upstream/main

280022a

Fixes after merge:

46ee7af

* use OperIs() * replace Intructionset_BMI2 => InstructionSetAVX2

Daniel-Svensson closed this Jun 12, 2025

Daniel-Svensson reopened this Jun 12, 2025

jakobbotsch reviewed Jun 12, 2025

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

fix review comment

7854142

kill rdx register for mulx instead of specifying as fixed register fo…

5f848c6

…r use

Daniel-Svensson force-pushed the x86_bmi_mulhi branch from 8c764ed to 5f848c6 Compare June 13, 2025 18:04

fix format

82ece23

Daniel-Svensson force-pushed the x86_bmi_mulhi branch from b6d5713 to 04450b6 Compare June 13, 2025 21:27

Daniel-Svensson added 3 commits June 13, 2025 23:27

remove register preference for mul, it does only make sense for exten…

04450b6

…ded 1 op mul - some cleanup of BuildMul, reorder andremove dead code

fix formatting

45625b7

remove swap in lowering

7c8dcfb

update fixed reg in lowering for division by constant

8ab4c9d

jakobbotsch reviewed Jun 16, 2025

View reviewed changes

change from isUsedFromMemory to isContained()

cbd5824

jakobbotsch approved these changes Jun 17, 2025

View reviewed changes

jakobbotsch requested a review from tannergooding June 17, 2025 10:33

tannergooding reviewed Jun 18, 2025

View reviewed changes

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Jun 18, 2025

View reviewed changes

src/coreclr/jit/lsraxarch.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Jun 18, 2025

View reviewed changes

tannergooding approved these changes Jun 18, 2025

View reviewed changes

Fix review comments

ab4ec11

tannergooding merged commit ede0118 into dotnet:main Jun 20, 2025
112 of 114 checks passed

AndyAyersMS mentioned this pull request Jun 24, 2025

[Perf] Windows/x64: 2 Improvements on 6/20/2025 12:23:58 AM +00:00 dotnet/perf-autofiling-issues#58193

Closed

	regMaskTP LinearScan::getKillSetForHWIntrinsic(GenTreeHWIntrinsic* node)
	{
	regMaskTP killMask = RBM_NONE;
	#ifdef TARGET_XARCH
	switch (node->GetHWIntrinsicId())
	{
	case NI_X86Base_MaskMove:
	// maskmovdqu uses edi as the implicit address register.
	// Although it is set as the srcCandidate on the address, if there is also a fixed
	// assignment for the definition of the address, resolveConflictingDefAndUse() may
	// change the register assignment on the def or use of a tree temp (SDSU) when there
	// is a conflict, and the FixedRef on edi won't be sufficient to ensure that another
	// Interval will not be allocated there.
	// Issue #17674 tracks this.
	killMask = RBM_EDI;
	break;

JIT: Emit mulx for GT_MULHI and GT_MUL_LONG if BMI2 is available #116198

JIT: Emit mulx for GT_MULHI and GT_MUL_LONG if BMI2 is available #116198

Uh oh!

Conversation

Daniel-Svensson commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Copilot Summary

Enhancements for BMI2 Instruction Set:

Operand and Memory Containment Logic:

Register Allocation and Kill Set Updates:

General Refactoring:

Code generation examples

Simple

BigMul

Division by constant

Uh oh!

Uh oh!

Daniel-Svensson Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Daniel-Svensson commented Jun 12, 2025

Uh oh!

jakobbotsch commented Jun 12, 2025

Uh oh!

Uh oh!

Daniel-Svensson commented Jun 13, 2025

Uh oh!

jakobbotsch commented Jun 13, 2025

Uh oh!

Daniel-Svensson commented Jun 13, 2025

Uh oh!

Daniel-Svensson commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Daniel-Svensson commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daniel-Svensson Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tannergooding Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Daniel-Svensson commented Jun 1, 2025 •

edited

Loading

Daniel-Svensson Jun 1, 2025 •

edited

Loading

Daniel-Svensson commented Jun 14, 2025 •

edited

Loading

Daniel-Svensson commented Jun 15, 2025 •

edited

Loading

Daniel-Svensson Jun 16, 2025 •

edited

Loading

tannergooding Jun 18, 2025 •

edited

Loading