-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Implement the reuse of duplicate constant values #43899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
briansull
commented
Oct 27, 2020
- Duplicate values will now be reused in the constants area
- Added MIN_DATA_ALIGN and MAX_DATA_ALIGN constants
- Added dsDataType to record the type of the constant
- Print the floating point values in the constants area
- Changed 'alignment' and 'cnsSize' parameters to be unsigned instead of UNATIVE_OFFSET
- Added method emitDataGenFind to search and return a duplicate constant value
- Duplicate values will now be reused in the constants area - Added MIN_DATA_ALIGN and MAX_DATA_ALIGN constants - Added dsDataType to record the type of the constant - Print the floating point values in the constants area - Changed 'alignment' and 'cnsSize' parameters to be unsigned instead of UNATIVE_OFFSET - Added method emitDataGenFind to search and return a duplicate constant value
Fixes #42178 |
Sample diffs:
|
@BruceForstall @CarolEidt PTAL |
UNATIVE_OFFSET cnum = emitDataGenBeg(cnsSize, cnsAlign); | ||
emitDataGenData(0, cnsAddr, cnsSize); | ||
emitDataGenEnd(); | ||
UNATIVE_OFFSET cnum = emitDataGenFind(cnsAddr, cnsSize, cnsAlign, dataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've introduced an O(n) search for each constant. Presumably, you could construct a function with a large number of constants where this might show up on a trace. It seems like the usual case would be very few searches, but the usual case is also probably very few matches (duplicates). Would be interesting to see some stats about this across our SPMI collections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that. There are very few constants used by the JIT, so I didn't think it was worth adding a hash table for this. I can add a check that counts the total number and asserts (in a checked JIT) when it exceeds a certain limit, (like 10,000 or so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth an assert. I suppose the search function could bail out after, say, 100 constants and just always add a new (possibly duplicated) constant after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is another possibility. Let me run the tests and SPMI to see if we even have this case today.
Using 10k we would hits this after have 142 unique constants and/or switch tables in a single method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I settled on only checking for matches in the first 64 entries and the adding the new value if is doesn't match
// If we don't find a match in the first 64, then we just add the new constant
// This prevents an O(n^2) search cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM - I'm not sure exactly how the dump looks now, but it seems to be a good improvement - thanks!
I'd agree with Bruce's suggestion to consider a limit on the number of reuse constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
src/coreclr/src/jit/emit.cpp
Outdated
// This restricts the alignment to: 1, 2, 4, 8, 16, or 32 bytes | ||
// Alignments greater than 32 would require VM support in ICorJitInfo::allocMem | ||
// This restricts the alignment to MAX_DATA_ALIGN | ||
// Alignments greater than 32 also require VM support in ICorJitInfo::allocMem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Alignments greater than 32 also require VM support in ICorJitInfo::allocMem | |
// Alignments greater than MAX_DATA_ALIGN also require VM support in ICorJitInfo::allocMem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you have add support for alignments greater than 32 to ICorJitInfo::allocMem, before you can increase the emitter's MAX_DATA_ALIGN. If we ever do that we may also move the definitions of MAX/MIN_DATA_ALIGN out of emit.h and into compiler.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the comment possibly be moved to be on MAX_DATA_ALIGN
and indicate something like "this can't be increased without also touching ICorJitInfo::allocMem
"
src/coreclr/src/jit/emit.cpp
Outdated
{ | ||
// Search the existing secDesc entries | ||
|
||
if ((secDesc->dsSize == cnsSize) && (secDesc->dsDataType == dataType) && ((curOffs % alignment) == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not just match the raw integral bits rather than needing to check "dataType"?
For example, if you had a ulong
constant that was 0x8000_0000_0000_0000
you could also use it for a double
constant that was -0.0
as they have the same 64-bit pattern. This would likely be useful for scenarios where you are doing floating-point algorithms that are manipulating the underlying bits.
Likewise, I imagine this might be less efficient for certain SIMD code. If you emit a SIMD constant for <0.0f, 1.0f, 2.0f, 3.0f>
and also have a floating-point constant for 1.0f
then we will have both emitted separately, rather than just having the latter load from element 1 of the SIMD constant.
These might just be future optimization opportunities, but I wanted to call them out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make that change. It would probably match floating point zero to integer zero frequently.
I also could do (secDesc->dsSize >= cnsSize)
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably match floating point zero to integer zero frequently
I think this is probably a rare case. On ARM there is the explicit zero register and on x86 I think we prefer xor reg, reg
and xorps reg, reg
as they are both smaller code and elided by the instruction decoder.
switch (dsc->dsDataType) | ||
{ | ||
case TYP_FLOAT: | ||
printf(" ; float %9.6g", (double)*reinterpret_cast<float*>(&dsc->dsCont)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this converting to double
for printf
rather than just keeping it as float
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK printf() only supports the printing of doubles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C calling convention only passes doubles, not floats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, I'm definitely not as up to date with what the official standards support.
A simple MSVC console app, however, looks to work and Clang/GCC don't give any warnings on compile.
printf(" ; float %9.6g", (double)*reinterpret_cast<float*>(&dsc->dsCont)); | ||
break; | ||
case TYP_DOUBLE: | ||
printf(" ; double %12.9g", *reinterpret_cast<double*>(&dsc->dsCont)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be better to use .17g
which guarantees it is uniquely representable (but may be overly verbose) or also print the underlying 64-bits as hex. As it is now, this will cause values to look identical in the log when they may not be in practice.
- In C#, I typically do
${x} (0x{BitConverter.DoubleToInt64Bits(x):X16}"
, but we also ensuresdouble.ToString
andfloat.ToString
are the "shortest roundtrippable string" by default now.
C++ 17 has std::to_chars which also ensures the result is the "shortest roundtrippable string" (provided it is reparsed by std::from_chars
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have printed the binary value.
Additionally we print the floating point value as a comment in the Asm output to help the user see what value is being loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, I saw us printing the binary value elsewhere but wasn't sure where in the dump that was relative to this.
It would still be nice if we could ensure that values don't conflict in other places though (IMO).
For example, trimming double
to 9 digits will result in both of the following looking equal:
3.141592653589793 (0x400921FB54442D18) -- Math.PI
3.14159265 (0x400921FB53C8D4F1)
However, they have 8 million uniquely representable values that exist between them.
Allow binary bit pattern matches of constants
80f63b4
to
56bf515
Compare
Updated with Tanner's suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for the improvements!
@BruceForstall You are still listed as requested changes (which I have made) |