Skip to content

Behaviour from Dataset.broadcast_like is strange and inconsistent with how arithmetic ops on Datasets actually broadcast #10031

Closed
@mjwillson

Description

@mjwillson

What happened?

dataset = xarray.Dataset({'x': (('a',), np.zeros(2)),
                          'y': (('b',), np.zeros(3))})

result = dataset.broadcast_like(dataset)   # same problem for xarray.broadcast also
result.x.dims
=> ('a', 'b')
result.y.dims
=> ('a', 'b')

What did you expect to happen?

I expected the shape to be consistent with how an actual arithmetic operation broadcasts:

result = dataset + dataset
result.x.dims
=> ('a',)
result.y.dims
=> ('b')

This looks like a bug to me, but if the behaviour is intentional can we please document the reason and draw attention to it with big alarm bells in the docs, it as it's very unexpected and can lead to undesired blow-ups in the size of arrays.

Either way can we please have a version of the broadcast API which broadcasts datasets in the same way that arithmetic operations broadcast them?

Minimal Complete Verifiable Example

import xarray
import numpy as np
dataset = xarray.Dataset({'x': (('a',), np.zeros(2)),
                          'y': (('b',), np.zeros(3))})
result = dataset.broadcast_like(dataset)
result.x.dims, result.y.dims

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

This is related to #6549, which has been open as a feature request for ~3 years, although not quite the same. Opening a bug anyway with an more focused / minimal illustration of why this makes very little sense.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.8 (stable, redacted, redacted) [Clang 9999.0.0 (4018317407006b2c632fbb75729de624a2426439)] python-bits: 64 OS: Linux OS-release: 6.10.11-1rodete2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.3 libnetcdf: 4.6.1

xarray: 2025.01.2
pandas: 2.2.3
numpy: 2.2.1
scipy: 1.13.1
netCDF4: 1.4.1
pydap: None
h5netcdf: 999
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.9.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 0.dev0+unknown
pip: None
conda: None
pytest: None
mypy: None
IPython: 7.34.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions