Skip to content

Behaviour from Dataset.broadcast_like is strange and inconsistent with how arithmetic ops on Datasets actually broadcast #10031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks done
mjwillson opened this issue Feb 6, 2025 · 7 comments

Comments

@mjwillson
Copy link

mjwillson commented Feb 6, 2025

What happened?

dataset = xarray.Dataset({'x': (('a',), np.zeros(2)),
                          'y': (('b',), np.zeros(3))})

result = dataset.broadcast_like(dataset)   # same problem for xarray.broadcast also
result.x.dims
=> ('a', 'b')
result.y.dims
=> ('a', 'b')

What did you expect to happen?

I expected the shape to be consistent with how an actual arithmetic operation broadcasts:

result = dataset + dataset
result.x.dims
=> ('a',)
result.y.dims
=> ('b')

This looks like a bug to me, but if the behaviour is intentional can we please document the reason and draw attention to it with big alarm bells in the docs, it as it's very unexpected and can lead to undesired blow-ups in the size of arrays.

Either way can we please have a version of the broadcast API which broadcasts datasets in the same way that arithmetic operations broadcast them?

Minimal Complete Verifiable Example

import xarray
import numpy as np
dataset = xarray.Dataset({'x': (('a',), np.zeros(2)),
                          'y': (('b',), np.zeros(3))})
result = dataset.broadcast_like(dataset)
result.x.dims, result.y.dims

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

This is related to #6549, which has been open as a feature request for ~3 years, although not quite the same. Opening a bug anyway with an more focused / minimal illustration of why this makes very little sense.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.8 (stable, redacted, redacted) [Clang 9999.0.0 (4018317407006b2c632fbb75729de624a2426439)] python-bits: 64 OS: Linux OS-release: 6.10.11-1rodete2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.3 libnetcdf: 4.6.1

xarray: 2025.01.2
pandas: 2.2.3
numpy: 2.2.1
scipy: 1.13.1
netCDF4: 1.4.1
pydap: None
h5netcdf: 999
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.9.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 0.dev0+unknown
pip: None
conda: None
pytest: None
mypy: None
IPython: 7.34.0
sphinx: None

@mjwillson mjwillson added bug needs triage Issue that has not been reviewed by xarray team member labels Feb 6, 2025
@dcherian dcherian added usage question and removed bug needs triage Issue that has not been reviewed by xarray team member labels Feb 6, 2025
@dcherian
Copy link
Contributor

dcherian commented Feb 6, 2025

Broadcasting in the way you request is a no-op in Xarray-land, so you don't need broadcast_like.

  1. If two Xarray objects have dimensions of different names (your example), they are automatically broadcastable against each other. Nothing needs to be done.
  2. If they share a dimension (which means said dimensions has the same name), then that dimension is checked for alignment. Use xr.align with its join option to do the checking / alignment you need.

What we don't support yet, is allowing broadcast to insert a size-1 unlabeled dimension of the same name. This I have found useful in combination with apply_ufunc (https://github.com/xarray-contrib/flox/blob/ca576812e78b3978421eace6e9dde5a76729ebcc/flox/xarray.py#L45-L62) so that you can avoid useless extra work in a downstream function that takes unlabeled arrays. This would require #2171 also.

Admittedly, our documentation on this should be a lot better. See https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html.

cc @headtr1ck

@dcherian
Copy link
Contributor

dcherian commented Feb 6, 2025

Apologies, now I get it after re-reading #6549 a few times. IIUC you'd like the behaviour of broadcast_like to broadcast like variables against each other. Shall we close in favor of #6549?

@mjwillson
Copy link
Author

mjwillson commented Feb 6, 2025

Broadcasting in the way you request is a no-op in Xarray-land, so you don't need broadcast_like.

So the example above was just a minimal reproducer to illustrate the non-sensical / inconsistent behaviour. In practise we run into related issues when broadcasting different datasets in less trivial cases.

Also -- you say "Broadcasting in the way you request is a no-op" -- I agree it should be a no-op, but the above clearly illustrates that it isn't, which is kind of the point here right?

The other ticket is about broadcasting Dataset against DataArray, I think it's likely the same underlying cause, but if I had to summarize the overall problem, it's that behaviour of xarray.broadcast (and Dataset.broadcast_like etc) is not consistent with how actual arithmetic operations broadcast, in cases where Datasets are involved.

@dcherian
Copy link
Contributor

dcherian commented Feb 6, 2025

Thanks for the clarification. Yes i misunderstood your initial post. Apologies for that. I'm closing in favor of #6549.

@mjwillson
Copy link
Author

mjwillson commented Feb 6, 2025

No worries. Is it intentional that xr.broadcast / broadcast_like is inconsistent with how arithmetic ops actually broadcast? If not I'd suggest making #6549 a bug not a FR.

@dcherian
Copy link
Contributor

dcherian commented Feb 6, 2025

Is it intentional that xr.broadcast / broadcast_like is inconsistent with how arithmetic ops actually broadcast?

broadcast is quite old so presumably its intentional. IIUC I added broadcast_like as a handy alias to broadcast so it inherited that behaviour (unintentionally). Seems to me that _like implies stricter behaviour by default.

@mjwillson
Copy link
Author

If it is intentional it would be good to confirm, and to document what the behaviour is with Datasets. I rather suspect it's behaviour that most people wouldn't want or expect, although I may be missing the original motivation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants