Skip to content

CoreML CPU InstanceNorm3d behaves differently depending on track_running_stats #2666

@0xShug0

Description

@0xShug0

🐛 Describe the bug

CoreML CPU InstanceNorm3d behaves differently depending on track_running_stats

Summary

I found this with opdiff while comparing PyTorch modules across backends.

For torch.nn.InstanceNorm3d(256, affine=False) on the CoreML CPU FP32 path (aslo FP16 but the example script only use FP32), I see different behavior depending on track_running_stats:

  • track_running_stats=False:
    export succeeds, CoreML inference succeeds, and output matches PyTorch closely
  • track_running_stats=True:
    export begins the same way, but ct.convert(...) fails with:
    NotImplementedError: Unsupported fx node alias, kind alias

So this looks like an inconsistency triggered only by enabling running stats on the same module family / backend configuration.


Minimal Repro

Self-contained repro script:

coreml_instancenorm3d_running_stats.py

Core setup:

model = nn.InstanceNorm3d(
    256,
    affine=False,
    track_running_stats=track_running_stats,
).eval()

Input shape:

x = torch.randn(1, 256, 32, 28, 28, dtype=torch.float32)

CoreML config:

ct.convert(
    exported,
    inputs=[ct.TensorType(shape=x.shape)],
    convert_to="mlprogram",
    minimum_deployment_target=ct.target.iOS18,
    compute_units=ct.ComputeUnit.CPU_ONLY,
    compute_precision=ct.precision.FLOAT32,
)

Export path:

exported = torch.export.export(model, (x,))
exported = exported.run_decompositions()

Observed Behavior

track_running_stats=False

This succeeds end to end:

  • torch.export.export OK
  • run_decompositions() OK
  • ct.convert(...) OK
  • mlmodel.predict(...) OK

Output parity is good:

  • max_abs_diff = 1.430511474609375e-06
  • mean_abs_diff = 5.992006890664925e-08

track_running_stats=True

This fails at conversion:

NotImplementedError: Unsupported fx node alias, kind alias

So the “off” mode works, while the “on” mode fails for the same backend and otherwise similar setup.


Why this seems notable

This does not look like a generic InstanceNorm3d failure, because the same repro succeeds when track_running_stats=False.

That makes it seem specifically tied to the running-stats version of the module, possibly due to how buffers / aliasing are represented after export + decomposition.


Expected Behavior

I’d expect one of these:

  • both configurations convert and run, or
  • the unsupported case is rejected more explicitly/documentedly

Right now it looks surprising that toggling track_running_stats flips the CoreML CPU FP32 result from “works with good parity” to “conversion failure”.


Environment

  • PyTorch 2.10.0
  • coremltools 9.0
  • Python 3.11
  • macOS / Apple Silicon

Metadata

Metadata

Assignees

No one assigned

    Labels

    PyTorch (traced)bugUnexpected behaviour that should be corrected (type)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions