Open
Description
We will continue to improve the code quality for Arm64 targets in .NET 10 to benefit our customers who run or wants to run their workload on Arm64 hardware.
General optimizations
PAC/RET feature enablement
- Cobalt 100 hardware has pointer-authentication extension and as part of security measure, we would like to add the support in .NET 10, both for the .NET runtime as well as JIT code. More details can be found in ARM64: Support for PAC-RET in .NET10 #109457. PR: Arm-64: Add initial support for PAC-RET #110472
Compact encoding
- Improve code quality by making use of instructions that do more than one operation and hence improve the encoding of Arm64. Also, as part of this work, we will revisit the addressing modes that are ignored or used less frequently (e.g. post-index addressing mode) but can give much better code quality. Review the multi-op instruction usage for Arm64 #68028
Improvements in GC
- Improve GC's vxsort algorithm to take advantage of NEON intrinsics. Related: ARM64 GC: Use SVE when sorting the mark list #108473 PR: vxsort: Add Arm64 Neon implementation and tests #110692
- Modernize write barriers for Arm64: In various benchmarks, we have seen write barrier on arm64 is more time consuming that x86 counterpart. This is despite the fact that arm64 have conservative write-barrier (which does less work) instead of precise write barrier present in x86 (which does more work). The first step is to analyze the results from our experiments done in Significant Performance Disparity Between Arm64 and x64 Write Barriers #106051. Next step would be to see and enable precise write barrier for arm64. On x64, it showed significant wins in GC pause time and hence overall throughput. Another thing we want to explore is what happens when we have multiple versions of write-barrier similar to x86 and if we will give us any benefits.
Scalable Vector Extension
Wrap the non-streaming SVE work
- Complete Pri1 issues found during .NET 9
- Complete Pri2 issues found during .NET 9
- Complete Pri3 issues found during .NET 9
Add support for vector length agnostic
The primary requirement before starting the design of streaming-mode SVE and SME would be to add support in JIT/.NET runtime for VL agnostic. This includes the following:
- (WIP) Introduce
TYP_SIMD
and educate various JIT code paths about the new type. See if some portion of this can be achievable on how we handlestackalloc
. - (WIP) Make sure
getVectorTByteLength()
returns VL that is available on the hardware and fix all the JIT code paths affected. - Sort locals such that
TYP_SIMD
/TYP_MASK
are at the very last. They will be places at the bottom of the stack frame layout. - (WIP) Access the stack offsets of
TYP_SIMD
/TYP_MASK
using sve instructions - Enable non-streaming SVE for NativeAOT / crossgen with VL agnostic.
Reference: #101477
Design streaming mode SVE and SME
- Come up with API design of streaming-mode SVE and SME and its interaction with non-streaming APIs as well as NEON APIs.
- Implication of the streaming modes switch on overall .NET runtime executing process
- Handling of diagnostics and debugging during streaming mode
- NativeAOT and crossgen support in presence of streaming mode flag toggles
- How faults and exceptions will be handled, and how the state restore will happen.
- Handling of ZA storage register in LSRA
- PR: [SME] Design proposal #115037
Sve2 APIs
- Implement the approved non-streaming SVE2 APIs (Arm64: Implement SVE2 APIs #115479)
Stretch
- Use Arm intrinsics in more places in BCL
- [arm/arm64] Leaf frames, saving LR, and return address hijacking #35274
- Some of the unimportant SVE issues in Pri2 and Pri3 will be done if time permits
- Prototype streaming mode SVE and SME design on M4 (if available)
References:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Team User Stories