Skip to content

Delete FEATURE_DOUBLE_ALIGNMENT_HINT for GC heap allocations #115985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 26, 2025

Conversation

jkotas
Copy link
Member

@jkotas jkotas commented May 26, 2025

This optimization is no longer relevant on current x86 hardware. The performance penalty for misaligned memory access is negligible compared to what it used to be.

Fixes #101284

@Copilot Copilot AI review requested due to automatic review settings May 26, 2025 02:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the FEATURE_DOUBLE_ALIGNMENT_HINT code related to optimizing GC heap allocations for arrays of doubles, since modern x86 hardware no longer suffers from significant performance penalties due to misaligned memory accesses.

  • Removed FEATURE_DOUBLE_ALIGNMENT_HINT blocks in helper functions for GC memory allocations.
  • Removed configuration settings and related macros associated with the double alignment hint.
  • Updated comments to reflect the new behavior for double alignment.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/coreclr/vm/jitinterface.cpp Removed alignment hint adjustments in GC allocation helpers.
src/coreclr/vm/i386/jitinterfacex86.cpp Removed threshold adjustments for double arrays for large object heap.
src/coreclr/vm/gchelpers.cpp Deleted FEATURE_DOUBLE_ALIGNMENT_HINT conditions in allocation routines.
src/coreclr/vm/eeconfig.h Removed configuration getters and members for the double alignment feature.
src/coreclr/vm/eeconfig.cpp Deleted initialization and configuration code for double alignment.
src/coreclr/inc/switches.h Updated comments to reflect changes in double alignment usage.
src/coreclr/inc/clrconfigvalues.h Removed configuration value for double array to large object heap mapping.
Comments suppressed due to low confidence (1)

src/coreclr/inc/switches.h:154

  • The updated comment now focuses on double alignment for structs on the stack, which differs from the previous emphasis on arrays of doubles. Please verify that this comment accurately reflects the intended behavior after removing FEATURE_DOUBLE_ALIGNMENT_HINT.
// Prefer double alignment for structs with doubles on the stack.

@jkotas
Copy link
Member Author

jkotas commented May 26, 2025

See discussion at EgorBot/runtime-utils#356 (comment)

  • Keeping the optimization for stack allocations for now. The impact of removing it for stack allocations can be evaluated separately.
  • The implementations of x86 allocation helpers will be deleted in the upcoming change that unifies allocation helpers with NAOT.

@jkotas
Copy link
Member Author

jkotas commented May 26, 2025

cc @filipnavara

@am11
Copy link
Member

am11 commented May 26, 2025

@EgorBot -windows_intel -use32bit

using BenchmarkDotNet.Attributes;

public class Bench
{
    [Params(16, 32, 64, 128)]
    public int Size;

    [Benchmark]
    public double[] AllocDoubleArray() => new double[Size];

    [Benchmark]
    public double[] AllocDoubleArrayFixed() => new double[32];
}

Copy link
Member

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -151,8 +151,7 @@
#define FEATURE_64BIT_ALIGNMENT
#endif

// Prefer double alignment for structs and arrays with doubles. Put arrays of doubles more agressively
// into large object heap for performance because large object heap is 8 byte aligned
// Prefer double alignment for structs with doubles on the stack.
#if !defined(FEATURE_64BIT_ALIGNMENT) && !defined(HOST_64BIT)
#define FEATURE_DOUBLE_ALIGNMENT_HINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other usage is in

if (ShouldAlign8(numR8Fields, numInstanceFields))

which can be inlined:
if (numR8Fields * 2 > numInstanceFields && numR8Fields >= 2)
(only usage of ShouldAlign8)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be inlined but it won't make the code easier to understand. This heuristic is duplicated in AOT compilers in method with the same name:

private static bool ShouldAlign8(int dwR8Fields, int dwTotalFields)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was FEATURE_DOUBLE_ALIGNMENT_HINT can be replaced by defined(FEATURE_64BIT_ALIGNMENT) || defined(TARGET_32BIT) since the remaining two usages have nothing to do with double alignment.

Copy link
Member Author

@jkotas jkotas May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining usages have to do with double alignment on stack. I have updated the comment on FEATURE_DOUBLE_ALIGNMENT_HINT to reflected that and mentioned in the comment above that we can evaluate deleting that separately.

Copy link
Member

@filipnavara filipnavara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@MichalStrehovsky
Copy link
Member

We may have outerloop tests that test this such as:

// Goal: Test arrays of doubles are allocated on large object heap and therefore 8 byte aligned
// Assumptions:
// 1) large object heap is always 8 byte aligned
// 2) double array greater than 1000 elements is on large object heap
// 3) non-double array greater than 1000 elements but less than 85K is NOT on large object heap
// 4) new arrays allocated in large object heap is of generation 2
// 5) new arrays NOT allocated in large object heap is of generation 0
// 6) the threshold can be set by registry key DoubleArrayToLargeObjectHeap

Searching the codebase for 101284 may find those places because we disable these for native AOT and that's the issue number for it.

@jkotas
Copy link
Member Author

jkotas commented May 26, 2025

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@jkotas
Copy link
Member Author

jkotas commented May 26, 2025

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalPetryka
Copy link
Contributor

MichalPetryka commented May 26, 2025

Isn't this still required for correctness on ARM32?

@filipnavara
Copy link
Member

Isn't this still required for correctness on ARM32?

ARM32 [and WASM] uses a different define - FEATURE_64BIT_ALIGNMENT.

@MichalPetryka
Copy link
Contributor

Isn't this still required for correctness on ARM32?

ARM32 [and WASM] uses a different define - FEATURE_64BIT_ALIGNMENT.

Ah okay.

The performance penalty for misaligned memory access is negligible compared to what it used to be.

Wasn't there an issue a few months ago where an Intel engineer confirmed that unaligned atomics are very slow on X86 for 64bit xchg/cmpxchg? It might be worth testing how those are affected in perf with this.

@filipnavara
Copy link
Member

Wasn't there an issue a few months ago where an Intel engineer confirmed that unaligned atomics are very slow on X86 for 64bit xchg/cmpxchg? It might be worth testing how those are affected in perf with this.

This PR specifically affects the double type. I don't consider it likely that someone would use Interlocked.[Compare]Exchange with a double on x86.

@jkotas
Copy link
Member Author

jkotas commented May 26, 2025

Wasn't there an issue a few months ago where an Intel engineer confirmed that unaligned atomics are very slow on X86 for 64bit xchg/cmpxchg? It might be worth testing how those are affected in perf with this.

Yes, that's correct. However, Interlocked.* operations are much more likely to be used for long/ulong than for double. long/ulong are misaligned and have this problem like half of the time on x86. This PR is not changing how long/ulong are handled.

@jkotas
Copy link
Member Author

jkotas commented May 26, 2025

/ba-g infrastructure timeouts and known issues

@jkotas jkotas merged commit f154d65 into dotnet:main May 26, 2025
147 of 153 checks passed
@filipnavara
Copy link
Member

Thanks!

@jkotas jkotas deleted the FEATURE_DOUBLE_ALIGNMENT_HINT branch May 27, 2025 21:20
@github-actions github-actions bot locked and limited conversation to collaborators Jun 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port FEATURE_DOUBLE_ALIGNMENT_HINT to native AOT
6 participants