Skip to content

Remove use of conservative stack scanning mode in CoreCLR #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joshpeterson opened this issue Dec 8, 2021 · 1 comment
Closed

Remove use of conservative stack scanning mode in CoreCLR #16

joshpeterson opened this issue Dec 8, 2021 · 1 comment
Assignees
Labels
Tech Debt Known tech debt that we should address later

Comments

@joshpeterson
Copy link

For the time being, Unity requires that the CoreCLR GC uses its conservative stack scanning mode. Longer term, we want to change the way that native code in Unity interacts with managed code so that we no longer need to use this GC mode.

See dotnet#61919 for more discussion in the upstream repo.

@joshpeterson joshpeterson added the Tech Debt Known tech debt that we should address later label Dec 8, 2021
@joshpeterson joshpeterson self-assigned this Dec 8, 2021
@joshpeterson
Copy link
Author

Since current CoreCLR upstream code has removed OS-level thread suspension APIs and the CoreCLR GC code is not save to call with threads suspended in native code (e.g. the GC code might call malloc when a native thread is suspended while holding the allocator lock), we have decided not to use the conservative stack scanning mode in CoreCLR.

Instead, we will modify the Unity code to avoid the access to managed memory in native code, so that we can avoid the need to suspend threads in native code and scan native stack frames.

yamato-ci-bot pushed a commit that referenced this issue Jan 15, 2022
…otnet#63598)

* Fix native frame unwind in syscall on arm64 for VS4Mac crash report.

Add arm64 version of StepWithCompactNoEncoding for syscall leaf node wrappers that have compact encoding of 0.

Fix ReadCompactEncodingRegister so it actually decrements the addr.

Change StepWithCompactEncodingArm64 to match what MacOS libunwind does for framed and frameless stepping.

arm64 can have frames with the same SP (but different IPs). Increment SP for this condition so createdump's unwind
loop doesn't break out on the "SP not increasing" check and the frames are added to the thread frame list in the
correct order.

Add getting the unwind info for tail called functions like this:

__ZL14PROCEndProcessPvji:
   36630:       f6 57 bd a9     stp     x22, x21, [sp, #-48]!
   36634:       f4 4f 01 a9     stp     x20, x19, [sp, #16]
   36638:       fd 7b 02 a9     stp     x29, x30, [sp, #32]
   3663c:       fd 83 00 91     add     x29, sp, #32
...
   367ac:       e9 01 80 52     mov     w9, #15
   367b0:       7f 3e 02 71     cmp     w19, #143
   367b4:       20 01 88 1a     csel    w0, w9, w8, eq
   367b8:       2e 00 00 94     bl      _PROCAbort
_TerminateProcess:
-> 367bc:       22 00 80 52     mov     w2, #1
   367c0:       9c ff ff 17     b       __ZL14PROCEndProcessPvji

The IP (367bc) returns the (incorrect) frameless encoding with nothing on the stack (uses an incorrect LR to unwind). To fix this
get the unwind info for PC -1 which points to PROCEndProcess with the correct unwind info. This matches how lldb unwinds this frame.

Always address module segment to IP lookup list instead of checking the module regions.

Strip pointer authentication bits on PC/LR.
yamato-ci-bot pushed a commit that referenced this issue Feb 12, 2022
# Local heap optimizations on Arm64

1. When not required to zero the allocated space for local heap (for sizes up to 64 bytes) - do not emit zeroing sequence. Instead do stack probing and adjust stack pointer:

```diff
-            stp     xzr, xzr, [sp,#-16]!
-            stp     xzr, xzr, [sp,#-16]!
-            stp     xzr, xzr, [sp,#-16]!
-            stp     xzr, xzr, [sp,#-16]!
+            ldr     wzr, [sp],#-64
```

2. For sizes less than one `PAGE_SIZE` use `ldr wzr, [sp], #-amount` that does probing at `[sp]` and allocates the space at the same time. This saves one instruction for such local heap allocations:

```diff
-            ldr     wzr, [sp]
-            sub     sp, sp, #208
+            ldr     wzr, [sp],#-208
```

Use `ldp tmpReg, xzr, [sp], #-amount` when the offset not encodable by post-index variant of `ldr`:
```diff
-            ldr     wzr, [sp]
-            sub     sp, sp, dotnet#512
+            ldp     x0, xzr, [sp],#-512
```

3. Allow non-loop zeroing (i.e. unrolled sequence) for sizes up to 128 bytes (i.e. up to `LCLHEAP_UNROLL_LIMIT`). This frees up two internal integer registers for such cases:

```diff
-            mov     w11, #128
-                                               ;; bbWeight=0.50 PerfScore 0.25
-G_M44913_IG19:        ; gcrefRegs=00F9 {x0 x3 x4 x5 x6 x7}, byrefRegs=0000 {}, byref, isz
             stp     xzr, xzr, [sp,#-16]!
-            subs    x11, x11, #16
-            bne     G_M44913_IG19
+            stp     xzr, xzr, [sp,#-112]!
+            stp     xzr, xzr, [sp,#16]
+            stp     xzr, xzr, [sp,#32]
+            stp     xzr, xzr, [sp,#48]
+            stp     xzr, xzr, [sp,#64]
+            stp     xzr, xzr, [sp,#80]
+            stp     xzr, xzr, [sp,#96]
```

4. Do zeroing in ascending order of the effective address:

```diff
-            mov     w7, #96
-G_M49279_IG13:
             stp     xzr, xzr, [sp,#-16]!
-            subs    x7, x7, #16
-            bne     G_M49279_IG13
+            stp     xzr, xzr, [sp,#-80]!
+            stp     xzr, xzr, [sp,#16]
+            stp     xzr, xzr, [sp,#32]
+            stp     xzr, xzr, [sp,#48]
+            stp     xzr, xzr, [sp,#64]
```

In the example, the zeroing is done at `[initialSp-16], [initialSp-96], [initialSp-80], [initialSp-64], [initialSp-48], [initialSp-32]` addresses. The idea here is to allow a CPU to detect the sequential `memset` to `0` pattern and switch into write streaming mode.
yuc434 pushed a commit that referenced this issue Aug 8, 2022
* Initial implementation for contract customization

fix build errors

Move converter rooting to DefaultJsonTypeInfoResolver so that it can be used standalone

Fix ConfigurationList.IsReadOnly

Minor refactorings (#1)

* Makes the following changes:

* Move singleton initialization for DefaultTypeInfoResolver behind a static property.
* Consolidate JsonSerializerContext & IJsonTypeInfoResolver values to a single field.
* Move reflection fallback logic away from JsonSerializerContext and into JsonSerializerOptions

* Update src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.cs

* remove testing of removed field

Simplify the JsonTypeInfo.CreateObject implemenetation (#2)

* Simplify the JsonTypeInfo.CreateObject implemenetation

* Update src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Metadata/JsonTypeInfoOfT.cs

* Update src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Metadata/JsonTypeInfoOfT.cs

Co-authored-by: Krzysztof Wicher <[email protected]>

Co-authored-by: Krzysztof Wicher <[email protected]>

Tests and fixes for JsonTypeInfoKind.None

TypeInfo type mismatch tests

Allow setting NumberHandling on JsonTypeInfoKind.None

test resolver returning wrong type of options

JsonTypeInfo/JsonPropertyInfo mutability tests

rename test file

Move default converter rooting responsibility behind DefaultJsonTypeInfoResolver (#3)

* Move default converter rooting responsibility behind DefaultJsonTypeInfoResolver

* address feedback

Add simple test for using JsonTypeInfo<T> with APIs directly taking it

fix and tests for untyped/typed CreateObject

uncomment test cases, remove todo

More tests and tiny fixes

Add a JsonTypeInfoResolver.Combine test for JsonSerializerContext (#4)

* Fix JsonTypeInfoResolver.Combine for JsonSerializerContext

* Break up failing test

Fix simple scenarios for combining contexts (#6)

* Fix simple scenarios for combining contexts

* feedback

JsonSerializerContext combine test with different camel casing

Remove unneeded virtual calls & branching when accessing Get & Set delegates (#7)

JsonPropertyInfo tests everything minus ShouldSerialize & NumberHandling

Update src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs

Update src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs

throw InvalidOperationException rather than ArgumentNullException for source gen when PropertyInfo.Name is assigned through JsonPropertyInfoValues

tests for duplicated property names and JsonPropertyInfo.NumberHandling

Add tests for NumberHandling and failing tests for ShouldSerialize

disable the failing test and add extra checks

disable remainder of the failing ShouldSerialize tests, fix working one

Fix ShouldSerialize and IgnoreCondition interop

Add failing tests for CreateObject + parametrized constructors

Fix CreateObject support for JsonConstructor types (#10)

* Fix CreateObject support for JsonConstructor types

* address feedback

Make contexts more combinator friendly (#9)

* Make contexts more combinator friendly

* remove converter cache

* redesign test to account for JsonConstructorAttribute

* Combine unit tests

* address feedback

* Add acceptance tests for DataContract attributes & Specified pattern (#11)

* Add private field serialization acceptance test (#13)

* tests, PR feedback (#14)

* PR feedback and extra tests

* Shorten class name, remove incorrect check (not true for polimorphic cases)

* Make parameter matching for custom properties map property Name with parameter (#16)

* Test static initialization with JsonTypeInfo (#17)

* Fix test failures and proper fix this time (#18)

* Fix test failures and proper fix this time

* reinstate ActiveIssueAttribute

* PR feedback and adjust couple of tests which don't set TypeInfoResolver

* fix IAsyncEnumerable tests

* Lock JsonSerializerOptions in JsonTypeInfo.EnsureConfigured()

Co-authored-by: Eirik Tsarpalis <[email protected]>

Co-authored-by: Eirik Tsarpalis <[email protected]>
bholmes pushed a commit that referenced this issue Nov 3, 2022
…6616)

* Support Arm64 "constructed" constants in SuperPMI asm diffs

SuperPMI asm diffs tries to ignore constants that can change between
multiple replays, such as addresses that the replay engine must generate
and not simply hand back from the collected data.

Often, addresses have associated relocations generated during replay.
SuperPMI can use these relocations to adjust the constants to allow
two replays to match. However, there are cases on Arm64 where an address
both doesn't report a relocation and is "constructed" using multiple
`mov`/`movk` instructions.

One case is the `allocPgoInstrumentationBySchema()`
API which returns a pointer to a PGO data buffer. An address within this
buffer is constructed via a sequence such as:
```
mov     x0, dotnet#63408
movk    x0, dotnet#23602, lsl #16
movk    x0, dotnet#606, lsl #32
```

When SuperPMI replays this API, it constructs a new buffer and returns that
pointer, which is used to construct various actual addresses that are
generated as "constructed" constants, shown above.

This change "de-constructs" the constants and looks them up in the replay
address map. If base and diff match the mapped constants, there is no asm diff.

* Fix 32-bit build

I don't think we fully support 64-bit replay on 32-bit host, but this
fix at least makes it possible for this case.

* Support more general mov/movk sequence

Allow JIT1 and JIT2 to have a different sequence of
mov/movk[/movk[/movk]] that map to the same address in the
address map. That is, the replay constant might require a different
set of instructions (e.g., if a `movk` is missing because its constant
is zero).
mrvoorhe pushed a commit that referenced this issue Mar 19, 2025
* JIT: Introduce `LclVarDsc::lvIsMultiRegDest`

With recent work to expand returned promoted locals into `FIELD_LIST`
the only "whole references" of promoted locals we should see is when
stored from a multi-reg node. This is the only knowledge the backend
should need for correctness purposes, so introduce a bit to track this
property, and switch the backend to check this instead.

The existing `lvIsMultiRegRet` is essentially this + whether the local
is returned. We should be able to remove this, but it is currently used
for some heuristics in old promotion, so keep it around for now.

* JIT: Add some more constant folding in lowering

Add folding for shifts and certain binops that are now getting produced
late due to returned `FIELD_LIST` nodes.

win-arm64 example:
```csharp
[MethodImpl(MethodImplOptions.NoInlining)]
static ValueTask<byte> Foo()
{
    return new ValueTask<byte>(123);
}
```

```diff
 G_M17084_IG02:  ;; offset=0x0008
             mov     x0, xzr
-            mov     w1, #1
-            mov     w2, wzr
-            mov     w3, #123
-            orr     w2, w2, w3,  LSL #16
-            orr     w1, w2, w1,  LSL #24
-						;; size=24 bbWeight=1 PerfScore 4.00
+            mov     w1, #0x17B0000
+						;; size=8 bbWeight=1 PerfScore 1.00
```

* Feedback
alexey-zakharov pushed a commit that referenced this issue Jun 3, 2025
Add support using bitwise operations to reconstruct registers passed
into calls from multiple promoted fields. Remove the IR invariant that
`FIELD_LIST` args must always map cleanly to registers; instead, any
`FIELD_LIST` is allowed for register-only arguments before lowering, and
lowering takes care to normalize them into a handled shape.

`fgTryMorphStructArg` is changed to take advantage of this by now
producing `FIELD_LIST` even when a promoted arg does not match the
target ABI. Support in physical promotion will be added in a follow up.

Split arguments are not handled and retain the old IR invariant of
requiring registers and stack slots to make cleanly from `FIELD_LIST`.

win-x64 examples:
```csharp
static void Foo(int x)
{
    Use<int?>(x);
    Use<int?>(5);
    Use<int?>(null);
}
```

```diff
 G_M7200_IG02:  ;; offset=0x0004
-       mov      byte  ptr [rsp+0x20], 1
-       mov      dword ptr [rsp+0x24], ecx
-       mov      rcx, qword ptr [rsp+0x20]
+       mov      ecx, ecx
+       shl      rcx, 32
+       or       rcx, 1
        call     [Program:Bar(System.Nullable`1[int])]
-       mov      dword ptr [rsp+0x24], 5
-       mov      rcx, qword ptr [rsp+0x20]
+       mov      rcx, 0x500000001
        call     [Program:Bar(System.Nullable`1[int])]
-       mov      byte  ptr [rsp+0x20], 0
        xor      ecx, ecx
-       mov      dword ptr [rsp+0x24], ecx
-       mov      rcx, qword ptr [rsp+0x20]
-						;; size=55 bbWeight=1 PerfScore 14.25
+						;; size=34 bbWeight=1 PerfScore 7.50

 G_M7200_IG03:
        add      rsp, 40
        tail.jmp [Program:Bar(System.Nullable`1[int])]
 						;; size=10 bbWeight=1 PerfScore 2.25
```

```csharp
static void Foo(int x, float y)
{
    Use((x, y));
}
```

```diff
 G_M42652_IG01:  ;; offset=0x0000
-       push     rax
-						;; size=1 bbWeight=1 PerfScore 1.00
+						;; size=0 bbWeight=1 PerfScore 0.00

 G_M42652_IG02:
-       mov      dword ptr [rsp], ecx
-       vmovss   dword ptr [rsp+0x04], xmm1
-       mov      rcx, qword ptr [rsp]
+       vmovd    eax, xmm1
+       shl      rax, 32
+       mov      ecx, ecx
+       or       rcx, rax
 						;; size=13 bbWeight=1 PerfScore 3.00

 G_M42652_IG03:
-       add      rsp, 8
        tail.jmp [Program:Use[System.ValueTuple`2[int,float]](System.ValueTuple`2[int,float])]
```

A win-arm64 example:
```csharp
static void Foo(int[] arr)
{
    Use(arr.AsMemory());
}
```

```diff
 G_M33990_IG01:
-            stp     fp, lr, [sp, #-0x20]!
+            stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
-            str     xzr, [fp, #0x10]	// [V03 tmp2]
-						;; size=12 bbWeight=1 PerfScore 2.50
+						;; size=8 bbWeight=1 PerfScore 1.50

 G_M33990_IG02:
             cbz     x0, G_M33990_IG04
 						;; size=4 bbWeight=1 PerfScore 1.00

 G_M33990_IG03:
-            str     x0, [fp, #0x10]	// [V07 tmp6]
-            str     wzr, [fp, #0x18]	// [V08 tmp7]
-            ldr     w0, [x0, #0x08]
-            str     w0, [fp, #0x1C]	// [V09 tmp8]
+            ldr     w1, [x0, #0x08]
             b       G_M33990_IG05
-						;; size=20 bbWeight=0.50 PerfScore 3.50
+						;; size=8 bbWeight=0.50 PerfScore 2.00

 G_M33990_IG04:
-            str     xzr, [fp, #0x10]	// [V07 tmp6]
-            str     xzr, [fp, #0x18]
-						;; size=8 bbWeight=0.50 PerfScore 1.00
+            mov     x0, xzr
+            mov     w1, wzr
+						;; size=8 bbWeight=0.50 PerfScore 0.50

 G_M33990_IG05:
-            ldp     x0, x1, [fp, #0x10]	// [V03 tmp2], [V03 tmp2+0x08]
-            movz    x2, #0xD920      // code for Program:Use[System.Memory`1[int]](System.Memory`1[int])
-            movk    x2, #0x4590 LSL #16
+            mov     w1, w1
+            lsl     x1, x1, #32
+            movz    x2, #0xD950      // code for Program:Use[System.Memory`1[int]](System.Memory`1[int])
+            movk    x2, #0x4592 LSL #16
             movk    x2, #0x7FFE LSL #32
             ldr     x2, [x2]
-						;; size=20 bbWeight=1 PerfScore 7.50
+						;; size=24 bbWeight=1 PerfScore 6.00

 G_M33990_IG06:
-            ldp     fp, lr, [sp], #0x20
+            ldp     fp, lr, [sp], #0x10
             br      x2
 						;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code: 72
+; Total bytes of code: 60
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tech Debt Known tech debt that we should address later
Projects
None yet
Development

No branches or pull requests

1 participant