🐛 Describe the bug
CoreML CPU InstanceNorm3d behaves differently depending on track_running_stats
Summary
I found this with opdiff while comparing PyTorch modules across backends.
For torch.nn.InstanceNorm3d(256, affine=False) on the CoreML CPU FP32 path (aslo FP16 but the example script only use FP32), I see different behavior depending on track_running_stats:
track_running_stats=False:
export succeeds, CoreML inference succeeds, and output matches PyTorch closely
track_running_stats=True:
export begins the same way, but ct.convert(...) fails with:
NotImplementedError: Unsupported fx node alias, kind alias
So this looks like an inconsistency triggered only by enabling running stats on the same module family / backend configuration.
Minimal Repro
Self-contained repro script:
coreml_instancenorm3d_running_stats.py
Core setup:
model = nn.InstanceNorm3d(
256,
affine=False,
track_running_stats=track_running_stats,
).eval()
Input shape:
x = torch.randn(1, 256, 32, 28, 28, dtype=torch.float32)
CoreML config:
ct.convert(
exported,
inputs=[ct.TensorType(shape=x.shape)],
convert_to="mlprogram",
minimum_deployment_target=ct.target.iOS18,
compute_units=ct.ComputeUnit.CPU_ONLY,
compute_precision=ct.precision.FLOAT32,
)
Export path:
exported = torch.export.export(model, (x,))
exported = exported.run_decompositions()
Observed Behavior
track_running_stats=False
This succeeds end to end:
torch.export.export OK
run_decompositions() OK
ct.convert(...) OK
mlmodel.predict(...) OK
Output parity is good:
max_abs_diff = 1.430511474609375e-06
mean_abs_diff = 5.992006890664925e-08
track_running_stats=True
This fails at conversion:
NotImplementedError: Unsupported fx node alias, kind alias
So the “off” mode works, while the “on” mode fails for the same backend and otherwise similar setup.
Why this seems notable
This does not look like a generic InstanceNorm3d failure, because the same repro succeeds when track_running_stats=False.
That makes it seem specifically tied to the running-stats version of the module, possibly due to how buffers / aliasing are represented after export + decomposition.
Expected Behavior
I’d expect one of these:
- both configurations convert and run, or
- the unsupported case is rejected more explicitly/documentedly
Right now it looks surprising that toggling track_running_stats flips the CoreML CPU FP32 result from “works with good parity” to “conversion failure”.
Environment
- PyTorch 2.10.0
- coremltools 9.0
- Python 3.11
- macOS / Apple Silicon
🐛 Describe the bug
CoreML CPU
InstanceNorm3dbehaves differently depending ontrack_running_statsSummary
I found this with opdiff while comparing PyTorch modules across backends.
For
torch.nn.InstanceNorm3d(256, affine=False)on the CoreML CPU FP32 path (aslo FP16 but the example script only use FP32), I see different behavior depending ontrack_running_stats:track_running_stats=False:export succeeds, CoreML inference succeeds, and output matches PyTorch closely
track_running_stats=True:export begins the same way, but
ct.convert(...)fails with:NotImplementedError: Unsupported fx node alias, kind aliasSo this looks like an inconsistency triggered only by enabling running stats on the same module family / backend configuration.
Minimal Repro
Self-contained repro script:
coreml_instancenorm3d_running_stats.py
Core setup:
Input shape:
CoreML config:
Export path:
Observed Behavior
track_running_stats=FalseThis succeeds end to end:
torch.export.exportOKrun_decompositions()OKct.convert(...)OKmlmodel.predict(...)OKOutput parity is good:
max_abs_diff = 1.430511474609375e-06mean_abs_diff = 5.992006890664925e-08track_running_stats=TrueThis fails at conversion:
So the “off” mode works, while the “on” mode fails for the same backend and otherwise similar setup.
Why this seems notable
This does not look like a generic
InstanceNorm3dfailure, because the same repro succeeds whentrack_running_stats=False.That makes it seem specifically tied to the running-stats version of the module, possibly due to how buffers / aliasing are represented after export + decomposition.
Expected Behavior
I’d expect one of these:
Right now it looks surprising that toggling
track_running_statsflips the CoreML CPU FP32 result from “works with good parity” to “conversion failure”.Environment