Skip to content

Enable feature stubprecode dynamic helpers again #115746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented May 19, 2025

Use the barrier approach to make this safe. Adding an address dependency got into lots of details around possibly needing an entire new stub type, and several paths through this code already depend on the concept of having a barrier here. To support modifying the stubs without inserting lots of magic number, rework how stubs are identified to just do a complete compare of the entire assembly stub in a way which will be easily to generalize to the cDAC.

In addition, make changes to make the FixupPrecode path safe from the same memory ordering concerns. Since the FixupPrecode always goes through a specified entrypoint, we can slide the barrier into only the slow path as long as we ensure that the Target field written with a VolatileStore after the MethodDesc and PrecodeFixupThunk fields are set. To ensure that the barrier on the writer side has happened before with hit the barrier on the reader, we ensure that the Target field is initialized into an infinite loop before the code associated with the FixupPrecode is mapped into the process.

Fixes #113810

… stream of assembly data instead of the current approach.
…e DAC. cDAC isn't done yet, but it should be simpler than the confusion that is today's implementation.
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@davidwrighton
Copy link
Member Author

/azp list

Copy link

CI/CD Pipelines for this repository:

@davidwrighton
Copy link
Member Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidwrighton davidwrighton marked this pull request as ready for review May 21, 2025 21:49
@Copilot Copilot AI review requested due to automatic review settings May 21, 2025 21:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for dynamic helper stub precodes (version 3) by switching to a full-byte-pattern comparison approach and integrates it into both the managed contracts and native runtime metadata.

  • Introduces PrecodeStubs_3_Impl with ReadBytesAndCompare for stub/fixup detection and updates the factory to return version 3.
  • Extends PrecodeMachineDescriptor (native) with full stub/fixup byte arrays and ignore masks, and initializes them in CDacPlatformMetadata.
  • Updates allocation and detection in precode.h/.cpp, VM startup, DAC tables, and ARM64 thunk templates to use the new patterns.

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
PrecodeStubs_3.cs New v3 contract API using full-byte comparisons
PrecodeStubs_2.cs, PrecodeStubs_1.cs Ensure v2/v1 chaining still works
PrecodeStubsFactory.cs Register version 3 in factory switch
readytoruninfo.cpp, loaderallocator.hpp Switch dynamic helper allocation to AllocStub()
precode.h, precode.cpp Remove magic offsets, add Is*ByASM_DAC, use full-byte compare
cdacplatformmetadata.{hpp,cpp}, dacvars.h, datadescriptor.h Add new DAC fields & initialization for byte-pattern data
arm64 thunktemplates.* Insert memory-barrier (dmb ishld) for stubs
contracts.jsonc Bump PrecodeStubs contract version to 3
dactable.cpp, enummem.cpp Include new platform metadata header
clrfeatures.cmake Enable FEATURE_STUBPRECODE_DYNAMIC_HELPERS on AMD64/ARM64
PrecodeStubs.md Document v3 byte-pattern approach

#endif
}

void CDacPlatformMetadata::InitPrecodes()
{
PrecodeMachineDescriptor::Init(&(&g_cdacPlatformMetadata)->precode);
Copy link
Preview

Copilot AI May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression &(&g_cdacPlatformMetadata)->precode is overly verbose. It can be simplified to &g_cdacPlatformMetadata.precode for clarity.

Suggested change
PrecodeMachineDescriptor::Init(&(&g_cdacPlatformMetadata)->precode);
PrecodeMachineDescriptor::Init(&g_cdacPlatformMetadata.precode);

Copilot uses AI. Check for mistakes.

@max-charlamb
Copy link
Contributor

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkotas
Copy link
Member

jkotas commented May 21, 2025

Any numbers for the worst-case perf regressions that we expect to get from this change?

ldr x10, DATA_SLOT(StubPrecode, Target)
ldr x12, DATA_SLOT(StubPrecode, SecretParam)
br x10
brk 0xf000 // Stubs need to be 24-byte in size to allow for the data to be 3 pointers
brk 0xf000 // Stubs need to be 24-byte in size to allow for the data to be 3 pointers
LEAF_END_MARKED StubPrecodeCode\STUB_PAGE_SIZE

LEAF_ENTRY FixupPrecodeCode\STUB_PAGE_SIZE
Copy link
Member

@jkotas jkotas May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we skip the barrier for FixupPrecode?

@@ -120,9 +120,12 @@ LEAF_END_MARKED CallCountingStubCodeTemplate, _TEXT
.irp STUB_PAGE_SIZE, 16384, 32768, 65536
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FEATURE_MAP_THUNKS_FROM_IMAGE above need the same change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh, I missed that part of the merge. Yes.

@@ -120,9 +120,12 @@ LEAF_END_MARKED CallCountingStubCodeTemplate, _TEXT
.irp STUB_PAGE_SIZE, 16384, 32768, 65536

LEAF_ENTRY StubPrecodeCode\STUB_PAGE_SIZE
dmb ishld
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a counter-part barrier on the writer sides (e.g. in DynamicHelperFixup)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I had written... or at least thought about writing it. I'll have that put together soon.

@@ -30,11 +30,11 @@
IN_PAGE_INDEX = 0
.rept STUB_PRECODE_NUM_THUNKS_PER_MAPPING

dmb ishld
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate barrier instruction? Would it be possible to use ldar instruction instead of the ldr to achieve the same thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would need to be an ldar that loads the address of the stub, not an ldar in the stub.

@davidwrighton davidwrighton marked this pull request as draft May 23, 2025 16:43
@davidwrighton davidwrighton marked this pull request as ready for review June 2, 2025 16:48
@risc-vv
Copy link

risc-vv commented Jun 9, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@risc-vv
Copy link

risc-vv commented Jun 10, 2025

RISC-V Release-CLR-QEMU: 0 / 9117 (0.00%)
=======================
      passed: 0
      failed: 9102
     skipped: 597
      killed: 15
------------------------
 TOTAL tests: 9714
VIRTUAL time: 24h 53min 38s 624ms
   REAL time: 23min 57s 66ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 0 / 262 (0.00%)
=======================
      passed: 0
      failed: 0
     skipped: 14
      killed: 262
------------------------
 TOTAL tests: 276
VIRTUAL time: 41min 36s 290ms
   REAL time: 41s 89ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 0 / 9116 (0.00%)
=======================
      passed: 0
      failed: 9101
     skipped: 597
      killed: 15
------------------------
 TOTAL tests: 9713
VIRTUAL time: 3h 3min 3s 72ms
   REAL time: 15min 30s 637ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 0 / 262 (0.00%)
=======================
      passed: 0
      failed: 0
     skipped: 14
      killed: 262
------------------------
 TOTAL tests: 276
VIRTUAL time: 2min 50s 17ms
   REAL time: 17s 873ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 2af00efe83a013db336ade6535e04c1c29865176
CI: 985b32219c9d1164ea1a09421e4f004672ee8c85
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@sirntar
Copy link
Member

sirntar commented Jun 11, 2025

After running tests on riscv64, everything ended up with SEGFAULT :(
I'll attach gdb and see why

@risc-vv
Copy link

risc-vv commented Jun 11, 2025

RISC-V Release-CLR-QEMU: 0 / 9119 (0.00%)
=======================
      passed: 0
      failed: 9104
     skipped: 597
      killed: 15
------------------------
 TOTAL tests: 9716
VIRTUAL time: 25h 3min 24s 295ms
   REAL time: 24min 6s 408ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 0 / 262 (0.00%)
=======================
      passed: 0
      failed: 0
     skipped: 14
      killed: 262
------------------------
 TOTAL tests: 276
VIRTUAL time: 42min 9s 648ms
   REAL time: 41s 489ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 0 / 9119 (0.00%)
=======================
      passed: 0
      failed: 9104
     skipped: 597
      killed: 15
------------------------
 TOTAL tests: 9716
VIRTUAL time: 3h 9min 3s 891ms
   REAL time: 16min 31s 660ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 0 / 262 (0.00%)
=======================
      passed: 0
      failed: 0
     skipped: 14
      killed: 262
------------------------
 TOTAL tests: 276
VIRTUAL time: 1min 29s 526ms
   REAL time: 11s 879ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 54b350141755a5f5ac3bb5d6a3f914c8a92404d5
CI: 8fec31ecd11a1e652234ca5056ed447368abd635
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

Copy link
Member

@sirntar sirntar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing fences from src/coreclr/vm/riscv64/thunktemplates.S will fix problems on rv64, but it's only a temporary solution. I'm looking for the reason why it doesn't work, but at the moment after jumping in CallDescrWorkerInternal to an invalid pTarget gdb loses track

Copy link
Contributor

@tomeksowi tomeksowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this, fixing 5 more load offsets got the tests running on my side

Comment on lines +8 to 11
fence r,rw
auipc t1, 0x4
ld t2, (StubPrecodeData__SecretParam)(t1)
ld t1, (StubPrecodeData__Target)(t1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fence r,rw
auipc t1, 0x4
ld t2, (StubPrecodeData__SecretParam)(t1)
ld t1, (StubPrecodeData__Target)(t1)
fence r,rw
auipc t1, 0x4
ld t2, (StubPrecodeData__SecretParam - 0x4)(t1)
ld t1, (StubPrecodeData__Target - 0x4)(t1)

Comment on lines +28 to 30
fence r,rw
auipc t2, 0x4
ld t3, (CallCountingStubData__RemainingCallCountCell)(t2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fence r,rw
auipc t2, 0x4
ld t3, (CallCountingStubData__RemainingCallCountCell)(t2)
fence r,rw
auipc t2, 0x4
ld t3, (CallCountingStubData__RemainingCallCountCell - 0x4)(t2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also subtract 0x4 from the two analogous loads at lines 35 and 38, (GH suggestions don't reach there)

@sirntar
Copy link
Member

sirntar commented Jun 13, 2025

coreclr tests passed, I'm now checking corefx, but it looks like we have a solution:

src/coreclr/vm/riscv64/thunktemplates.S
LEAF_ENTRY StubPrecodeCode
    fence r, rw
    auipc t1, 0x4
    ld t2, (StubPrecodeData__SecretParam - 0x4)(t1)
    ld t1, (StubPrecodeData__Target - 0x4)(t1)
    jr t1
LEAF_END_MARKED StubPrecodeCode

LEAF_ENTRY FixupPrecodeCode
    auipc t2, 0x4
    ld t2, (FixupPrecodeData__Target)(t2)
    c.jr t2

    fence r, rw
    auipc t2, 0x4
    ld t1, (FixupPrecodeData__PrecodeFixupThunk - 0xe)(t2)
    ld t2, (FixupPrecodeData__MethodDesc - 0xe)(t2)
    jr t1
LEAF_END_MARKED FixupPrecodeCode

LEAF_ENTRY CallCountingStubCode
    fence r, rw
    auipc t2, 0x4
    ld t3, (CallCountingStubData__RemainingCallCountCell - 0x4)(t2)
    lh t1, 0(t3)
    addiw t1, t1, -1
    sh t1, 0(t3)
    beq t1, zero, LOCAL_LABEL(CountReachedZero)
    ld t1, (CallCountingStubData__TargetForMethod - 0x4)(t2)
    jr t1
LOCAL_LABEL(CountReachedZero):
    ld t1, (CallCountingStubData__TargetForThresholdReached - 0x4)(t2)
    jr t1
LEAF_END_MARKED CallCountingStubCode

EDIT: all corefx tests reported stack overflow. I'm not certain if this is still only rv64 problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent test crash on linux arm64 caused by FEATURE_STUBPRECODE_DYNAMIC_HELPERS
7 participants