Skip to content

Implement the reuse of duplicate constant values #43899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 30, 2020

Conversation

briansull
Copy link
Contributor

  • Duplicate values will now be reused in the constants area
  • Added MIN_DATA_ALIGN and MAX_DATA_ALIGN constants
  • Added dsDataType to record the type of the constant
  • Print the floating point values in the constants area
  • Changed 'alignment' and 'cnsSize' parameters to be unsigned instead of UNATIVE_OFFSET
  • Added method emitDataGenFind to search and return a duplicate constant value

- Duplicate values will now be reused in the constants area
- Added MIN_DATA_ALIGN and MAX_DATA_ALIGN constants
- Added dsDataType to record the type of the constant
- Print the floating point values in the constants area
- Changed 'alignment' and 'cnsSize' parameters to be unsigned instead of UNATIVE_OFFSET
- Added method emitDataGenFind to search and return a duplicate constant value
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 27, 2020
@briansull
Copy link
Contributor Author

Fixes #42178

@briansull
Copy link
Contributor Author

briansull commented Oct 27, 2020

Sample diffs:

; Assembly listing for method Number:TryParseDouble(ReadOnlySpan`1,int,NumberFormatInfo,byref):bool
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rbp based frame
; partially interruptible

G_M10185_IG33:
<       movsd    xmm0, qword ptr [reloc @RWD24]
>       movsd    xmm0, qword ptr [reloc @RWD00]

G_M10185_IG39:
<       movsd    xmm0, qword ptr [reloc @RWD32]
>       movsd    xmm0, qword ptr [reloc @RWD16]

G_M10185_IG51:
<       movsd    xmm0, qword ptr [reloc @RWD40]
>       movsd    xmm0, qword ptr [reloc @RWD16]

<RWD00  dq	7FF0000000000000h
<RWD08  dq	FFF0000000000000h
<RWD16  dq	FFF8000000000000h
<RWD24  dq	7FF0000000000000h
<RWD32  dq	FFF8000000000000h
<RWD40  dq	FFF8000000000000h

>RWD00  	dq	7FF0000000000000h		;          inf
>RWD08  	dq	FFF0000000000000h		;         -inf
>RWD16  	dq	FFF8000000000000h		;    -nan(ind)

@briansull
Copy link
Contributor Author

@BruceForstall @CarolEidt PTAL

UNATIVE_OFFSET cnum = emitDataGenBeg(cnsSize, cnsAlign);
emitDataGenData(0, cnsAddr, cnsSize);
emitDataGenEnd();
UNATIVE_OFFSET cnum = emitDataGenFind(cnsAddr, cnsSize, cnsAlign, dataType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've introduced an O(n) search for each constant. Presumably, you could construct a function with a large number of constants where this might show up on a trace. It seems like the usual case would be very few searches, but the usual case is also probably very few matches (duplicates). Would be interesting to see some stats about this across our SPMI collections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that. There are very few constants used by the JIT, so I didn't think it was worth adding a hash table for this. I can add a check that counts the total number and asserts (in a checked JIT) when it exceeds a certain limit, (like 10,000 or so)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth an assert. I suppose the search function could bail out after, say, 100 constants and just always add a new (possibly duplicated) constant after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is another possibility. Let me run the tests and SPMI to see if we even have this case today.
Using 10k we would hits this after have 142 unique constants and/or switch tables in a single method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I settled on only checking for matches in the first 64 entries and the adding the new value if is doesn't match

// If we don't find a match in the first 64, then we just add the new constant
// This prevents an O(n^2) search cost

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - I'm not sure exactly how the dump looks now, but it seems to be a good improvement - thanks!
I'd agree with Bruce's suggestion to consider a limit on the number of reuse constants.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

// This restricts the alignment to: 1, 2, 4, 8, 16, or 32 bytes
// Alignments greater than 32 would require VM support in ICorJitInfo::allocMem
// This restricts the alignment to MAX_DATA_ALIGN
// Alignments greater than 32 also require VM support in ICorJitInfo::allocMem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Alignments greater than 32 also require VM support in ICorJitInfo::allocMem
// Alignments greater than MAX_DATA_ALIGN also require VM support in ICorJitInfo::allocMem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you have add support for alignments greater than 32 to ICorJitInfo::allocMem, before you can increase the emitter's MAX_DATA_ALIGN. If we ever do that we may also move the definitions of MAX/MIN_DATA_ALIGN out of emit.h and into compiler.h

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the comment possibly be moved to be on MAX_DATA_ALIGN and indicate something like "this can't be increased without also touching ICorJitInfo::allocMem"

{
// Search the existing secDesc entries

if ((secDesc->dsSize == cnsSize) && (secDesc->dsDataType == dataType) && ((curOffs % alignment) == 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just match the raw integral bits rather than needing to check "dataType"?

For example, if you had a ulong constant that was 0x8000_0000_0000_0000 you could also use it for a double constant that was -0.0 as they have the same 64-bit pattern. This would likely be useful for scenarios where you are doing floating-point algorithms that are manipulating the underlying bits.

Likewise, I imagine this might be less efficient for certain SIMD code. If you emit a SIMD constant for <0.0f, 1.0f, 2.0f, 3.0f> and also have a floating-point constant for 1.0f then we will have both emitted separately, rather than just having the latter load from element 1 of the SIMD constant.

These might just be future optimization opportunities, but I wanted to call them out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make that change. It would probably match floating point zero to integer zero frequently.
I also could do (secDesc->dsSize >= cnsSize) as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably match floating point zero to integer zero frequently

I think this is probably a rare case. On ARM there is the explicit zero register and on x86 I think we prefer xor reg, reg and xorps reg, reg as they are both smaller code and elided by the instruction decoder.

switch (dsc->dsDataType)
{
case TYP_FLOAT:
printf(" ; float %9.6g", (double)*reinterpret_cast<float*>(&dsc->dsCont));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this converting to double for printf rather than just keeping it as float?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK printf() only supports the printing of doubles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C calling convention only passes doubles, not floats

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, I'm definitely not as up to date with what the official standards support.

A simple MSVC console app, however, looks to work and Clang/GCC don't give any warnings on compile.

printf(" ; float %9.6g", (double)*reinterpret_cast<float*>(&dsc->dsCont));
break;
case TYP_DOUBLE:
printf(" ; double %12.9g", *reinterpret_cast<double*>(&dsc->dsCont));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be better to use .17g which guarantees it is uniquely representable (but may be overly verbose) or also print the underlying 64-bits as hex. As it is now, this will cause values to look identical in the log when they may not be in practice.

  • In C#, I typically do ${x} (0x{BitConverter.DoubleToInt64Bits(x):X16}", but we also ensures double.ToString and float.ToString are the "shortest roundtrippable string" by default now.

C++ 17 has std::to_chars which also ensures the result is the "shortest roundtrippable string" (provided it is reparsed by std::from_chars).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have printed the binary value.
Additionally we print the floating point value as a comment in the Asm output to help the user see what value is being loaded.

Copy link
Member

@tannergooding tannergooding Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, I saw us printing the binary value elsewhere but wasn't sure where in the dump that was relative to this.

It would still be nice if we could ensure that values don't conflict in other places though (IMO).
For example, trimming double to 9 digits will result in both of the following looking equal:

3.141592653589793 (0x400921FB54442D18) -- Math.PI
3.14159265        (0x400921FB53C8D4F1)

However, they have 8 million uniquely representable values that exist between them.

Allow binary bit pattern matches of constants
@briansull
Copy link
Contributor Author

briansull commented Oct 29, 2020

Updated with Tanner's suggestions

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks for the improvements!

@briansull
Copy link
Contributor Author

@BruceForstall You are still listed as requested changes (which I have made)

@briansull briansull merged commit 4d09ba6 into dotnet:master Oct 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RyuJIT: inst_RV_RV_TT doesn't check if constants already exists in the data section
5 participants