-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Enable feature stubprecode dynamic helpers again #115746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enable feature stubprecode dynamic helpers again #115746
Conversation
… stream of assembly data instead of the current approach.
…e DAC. cDAC isn't done yet, but it should be simpler than the confusion that is today's implementation.
Tagging subscribers to this area: @mangod9 |
…STUBPRECODE_DYNAMIC_HELPERS_again
…STUBPRECODE_DYNAMIC_HELPERS_again
/azp list |
/azp run runtime-diagnostics |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for dynamic helper stub precodes (version 3) by switching to a full-byte-pattern comparison approach and integrates it into both the managed contracts and native runtime metadata.
- Introduces
PrecodeStubs_3_Impl
withReadBytesAndCompare
for stub/fixup detection and updates the factory to return version 3. - Extends
PrecodeMachineDescriptor
(native) with full stub/fixup byte arrays and ignore masks, and initializes them inCDacPlatformMetadata
. - Updates allocation and detection in
precode.h/.cpp
, VM startup, DAC tables, and ARM64 thunk templates to use the new patterns.
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
PrecodeStubs_3.cs | New v3 contract API using full-byte comparisons |
PrecodeStubs_2.cs, PrecodeStubs_1.cs | Ensure v2/v1 chaining still works |
PrecodeStubsFactory.cs | Register version 3 in factory switch |
readytoruninfo.cpp, loaderallocator.hpp | Switch dynamic helper allocation to AllocStub() |
precode.h, precode.cpp | Remove magic offsets, add Is*ByASM_DAC , use full-byte compare |
cdacplatformmetadata.{hpp,cpp}, dacvars.h, datadescriptor.h | Add new DAC fields & initialization for byte-pattern data |
arm64 thunktemplates.* | Insert memory-barrier (dmb ishld ) for stubs |
contracts.jsonc | Bump PrecodeStubs contract version to 3 |
dactable.cpp, enummem.cpp | Include new platform metadata header |
clrfeatures.cmake | Enable FEATURE_STUBPRECODE_DYNAMIC_HELPERS on AMD64/ARM64 |
PrecodeStubs.md | Document v3 byte-pattern approach |
#endif | ||
} | ||
|
||
void CDacPlatformMetadata::InitPrecodes() | ||
{ | ||
PrecodeMachineDescriptor::Init(&(&g_cdacPlatformMetadata)->precode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression &(&g_cdacPlatformMetadata)->precode
is overly verbose. It can be simplified to &g_cdacPlatformMetadata.precode
for clarity.
PrecodeMachineDescriptor::Init(&(&g_cdacPlatformMetadata)->precode); | |
PrecodeMachineDescriptor::Init(&g_cdacPlatformMetadata.precode); |
Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
/azp run runtime-diagnostics |
Azure Pipelines successfully started running 1 pipeline(s). |
Any numbers for the worst-case perf regressions that we expect to get from this change? |
ldr x10, DATA_SLOT(StubPrecode, Target) | ||
ldr x12, DATA_SLOT(StubPrecode, SecretParam) | ||
br x10 | ||
brk 0xf000 // Stubs need to be 24-byte in size to allow for the data to be 3 pointers | ||
brk 0xf000 // Stubs need to be 24-byte in size to allow for the data to be 3 pointers | ||
LEAF_END_MARKED StubPrecodeCode\STUB_PAGE_SIZE | ||
|
||
LEAF_ENTRY FixupPrecodeCode\STUB_PAGE_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we skip the barrier for FixupPrecode?
@@ -120,9 +120,12 @@ LEAF_END_MARKED CallCountingStubCodeTemplate, _TEXT | |||
.irp STUB_PAGE_SIZE, 16384, 32768, 65536 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does FEATURE_MAP_THUNKS_FROM_IMAGE
above need the same change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh, I missed that part of the merge. Yes.
@@ -120,9 +120,12 @@ LEAF_END_MARKED CallCountingStubCodeTemplate, _TEXT | |||
.irp STUB_PAGE_SIZE, 16384, 32768, 65536 | |||
|
|||
LEAF_ENTRY StubPrecodeCode\STUB_PAGE_SIZE | |||
dmb ishld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a counter-part barrier on the writer sides (e.g. in DynamicHelperFixup)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I had written... or at least thought about writing it. I'll have that put together soon.
@@ -30,11 +30,11 @@ | |||
IN_PAGE_INDEX = 0 | |||
.rept STUB_PRECODE_NUM_THUNKS_PER_MAPPING | |||
|
|||
dmb ishld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a separate barrier instruction? Would it be possible to use ldar instruction instead of the ldr to achieve the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would need to be an ldar that loads the address of the stub, not an ldar in the stub.
…STUBPRECODE_DYNAMIC_HELPERS_again
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
RISC-V Release-CLR-QEMU: 0 / 9117 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 0 / 262 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 0 / 9116 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 0 / 262 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
After running tests on riscv64, everything ended up with SEGFAULT :( |
Co-authored-by: Tomasz Sowiński <[email protected]>
RISC-V Release-CLR-QEMU: 0 / 9119 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 0 / 262 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 0 / 9119 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 0 / 262 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing fences from src/coreclr/vm/riscv64/thunktemplates.S
will fix problems on rv64, but it's only a temporary solution. I'm looking for the reason why it doesn't work, but at the moment after jumping in CallDescrWorkerInternal to an invalid pTarget gdb loses track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this, fixing 5 more load offsets got the tests running on my side
fence r,rw | ||
auipc t1, 0x4 | ||
ld t2, (StubPrecodeData__SecretParam)(t1) | ||
ld t1, (StubPrecodeData__Target)(t1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fence r,rw | |
auipc t1, 0x4 | |
ld t2, (StubPrecodeData__SecretParam)(t1) | |
ld t1, (StubPrecodeData__Target)(t1) | |
fence r,rw | |
auipc t1, 0x4 | |
ld t2, (StubPrecodeData__SecretParam - 0x4)(t1) | |
ld t1, (StubPrecodeData__Target - 0x4)(t1) |
fence r,rw | ||
auipc t2, 0x4 | ||
ld t3, (CallCountingStubData__RemainingCallCountCell)(t2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fence r,rw | |
auipc t2, 0x4 | |
ld t3, (CallCountingStubData__RemainingCallCountCell)(t2) | |
fence r,rw | |
auipc t2, 0x4 | |
ld t3, (CallCountingStubData__RemainingCallCountCell - 0x4)(t2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also subtract 0x4 from the two analogous loads at lines 35 and 38, (GH suggestions don't reach there)
coreclr tests passed, src/coreclr/vm/riscv64/thunktemplates.SLEAF_ENTRY StubPrecodeCode
fence r, rw
auipc t1, 0x4
ld t2, (StubPrecodeData__SecretParam - 0x4)(t1)
ld t1, (StubPrecodeData__Target - 0x4)(t1)
jr t1
LEAF_END_MARKED StubPrecodeCode
LEAF_ENTRY FixupPrecodeCode
auipc t2, 0x4
ld t2, (FixupPrecodeData__Target)(t2)
c.jr t2
fence r, rw
auipc t2, 0x4
ld t1, (FixupPrecodeData__PrecodeFixupThunk - 0xe)(t2)
ld t2, (FixupPrecodeData__MethodDesc - 0xe)(t2)
jr t1
LEAF_END_MARKED FixupPrecodeCode
LEAF_ENTRY CallCountingStubCode
fence r, rw
auipc t2, 0x4
ld t3, (CallCountingStubData__RemainingCallCountCell - 0x4)(t2)
lh t1, 0(t3)
addiw t1, t1, -1
sh t1, 0(t3)
beq t1, zero, LOCAL_LABEL(CountReachedZero)
ld t1, (CallCountingStubData__TargetForMethod - 0x4)(t2)
jr t1
LOCAL_LABEL(CountReachedZero):
ld t1, (CallCountingStubData__TargetForThresholdReached - 0x4)(t2)
jr t1
LEAF_END_MARKED CallCountingStubCode EDIT: all corefx tests reported stack overflow. I'm not certain if this is still only rv64 problem |
Use the barrier approach to make this safe. Adding an address dependency got into lots of details around possibly needing an entire new stub type, and several paths through this code already depend on the concept of having a barrier here. To support modifying the stubs without inserting lots of magic number, rework how stubs are identified to just do a complete compare of the entire assembly stub in a way which will be easily to generalize to the cDAC.
In addition, make changes to make the FixupPrecode path safe from the same memory ordering concerns. Since the FixupPrecode always goes through a specified entrypoint, we can slide the barrier into only the slow path as long as we ensure that the Target field written with a VolatileStore after the
MethodDesc
andPrecodeFixupThunk
fields are set. To ensure that the barrier on the writer side has happened before with hit the barrier on the reader, we ensure that the Target field is initialized into an infinite loop before the code associated with the FixupPrecode is mapped into the process.Fixes #113810