Skip to content

DataArray.rolling fails with chunk size of 1 or 2 #9862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks done
paololucchino opened this issue Dec 6, 2024 · 7 comments
Closed
5 tasks done

DataArray.rolling fails with chunk size of 1 or 2 #9862

paololucchino opened this issue Dec 6, 2024 · 7 comments

Comments

@paololucchino
Copy link

What happened?

I'm hitting the error below when trying to calculate a rolling mean of window length 5 over an x, y, time cube of climate data chunked by x, y, 1.

ValueError: Moving window (=5) must between 1 and 4, inclusive

What did you expect to happen?

We would expect the rolling mean to calculate correctly.

Minimal Complete Verifiable Example

import dask.array as da
import xarray as xr
import numpy as np

# Dimensions and sizes
nx, ny, nt = 100, 200, 50  # size of x, y, and time dimensions
x = np.linspace(0, 10, nx)  # x-coordinates
y = np.linspace(0, 20, ny)  # y-coordinates
time = np.linspace(0, 1, nt)  # time coordinates

# Generate a random Dask array with lazy computation
data = da.random.random(size=(nx, ny, nt), chunks=(100, 200, 1))

# Create an xarray DataArray with coordinates and attributes
data_array = xr.DataArray(
    data,
    dims=["x", "y", "time"],
    coords={"x": x, "y": y, "time": time},
    name="dummy_data",
    attrs={"units": "arbitrary", "description": "Dummy 3D dataset"}
)

d_rolling = data_array.rolling(time=5).mean()
d_rolling.compute()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/paolo/miniforge3/envs/hip-analysis-dev-env/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1207, in compute
    return new.load(**kwargs)
  File "/Users/paolo/miniforge3/envs/hip-analysis-dev-env/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1175, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/Users/paolo/miniforge3/envs/hip-analysis-dev-env/lib/python3.10/site-packages/xarray/core/dataset.py", line 899, in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
  File "/Users/paolo/miniforge3/envs/hip-analysis-dev-env/lib/python3.10/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
  File "/Users/paolo/miniforge3/envs/hip-analysis-dev-env/lib/python3.10/site-packages/dask/base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/Users/paolo/miniforge3/envs/hip-analysis-dev-env/lib/python3.10/site-packages/dask/_task_spec.py", line 739, in __call__
    return self.func(*new_argspec, **kwargs)
ValueError: Moving window (=5) must between 1 and 4, inclusive

Anything else we need to know?

The issue happens only when chunks are of dimension 1 (as in the example above) or 2, as in the following snippet:

data_array = data_array.chunk(time=2)
d_rolling = data_array.rolling(time=3).mean()
d_rolling.compute()

But rechunking to 3 or above, and also -1, computes without error.

The error started happening when I updated by environment. See below.

Possibly related to #4922?

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:12:04) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 23.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.11.0
pandas: 2.2.3
numpy: 1.26.4
scipy: 1.14.1
netCDF4: 1.7.2
pydap: None
h5netcdf: None
h5py: 3.12.1
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.12.0
distributed: 2024.12.0
matplotlib: 3.9.3
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: 0.24.4
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.6.0
pip: 24.3.1
conda: installed
pytest: 8.3.4
mypy: 1.13.0
IPython: 8.30.0
sphinx: None

@paololucchino paololucchino added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 6, 2024
Copy link

welcome bot commented Dec 6, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@dcherian
Copy link
Contributor

dcherian commented Dec 6, 2024

I think this is an upstream dask issue: dask/dask#11580 though clearly we are missing a test.

@dcherian dcherian added topic-dask upstream issue topic-rolling and removed needs triage Issue that has not been reviewed by xarray team member labels Dec 6, 2024
@phofl
Copy link
Contributor

phofl commented Dec 17, 2024

I do have a fix on the dask side that should be good to go soon

@dcherian dcherian removed the bug label Dec 17, 2024
@phofl
Copy link
Contributor

phofl commented Dec 17, 2024

The dask pr is merged and will go out with the release today

@pittwolfe
Copy link

Issue has reappeared. Running the example in the initial issue report produces the same error (ValueError: Moving window (=5) must between 1 and 4, inclusive) with the same traceback, but I'm using:

  • dask: 2025.2.0
  • xarray: 2025.1.0

As with the OP, error persists when pasted into a new console or binder.

@paololucchino
Copy link
Author

paololucchino commented Apr 11, 2025

I've also been still struggling with this issue, but I think I've found what is causing it, as it is indeed bottleneck, potentially #4922 as mentioned above. I am not sure if this is something that needs to be addressed here or in bottleneck.

Minimal code example

import numpy as np
import xarray as xr
import numpy as np
import dask
import dask.array as da

arr = np.random.rand(10)
dask_da = xr.DataArray(da.from_array(arr, chunks=2), dims=["time"])
try:
    result = dask_da.rolling(time=3).mean().compute()
    print(result)
except ValueError as e:
    print(f"Error: {e}")

With bottleneck

Output:
Error: Moving window (=3) must between 1 and 2, inclusive

Env:

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.16 | packaged by conda-forge | (main, Dec  5 2024, 14:12:04) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 24.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2025.1.1
pandas: 2.2.3
numpy: 1.26.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
**bottleneck: 1.4.2**
dask: 2025.3.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.12.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Without bottleneck

Output:

<xarray.DataArray 'array-40bbe4b08b23c524843c69096c8a64e6' (time: 10)> Size: 80B
array([       nan,        nan, 0.62634802, 0.36758673, 0.3529306 ,
       0.24037543, 0.36745773, 0.3835441 , 0.60209106, 0.56244482])
Dimensions without coordinates: time

Env:

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.16 | packaged by conda-forge | (main, Dec  5 2024, 14:12:04) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 24.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2025.1.1
pandas: 2.2.3
numpy: 1.26.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
**bottleneck: None**
dask: 2025.3.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.12.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

@paololucchino
Copy link
Author

Hi @dcherian, do you think this issue should be reopened? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants