Skip to content

arithmetic resulting in inconsistent chunks #3323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcherian opened this issue Sep 19, 2019 · 2 comments · Fixed by #3276
Closed

arithmetic resulting in inconsistent chunks #3323

dcherian opened this issue Sep 19, 2019 · 2 comments · Fixed by #3276

Comments

@dcherian
Copy link
Contributor

import xarray as xr
import numpy as np

def make_da():
    return xr.DataArray(
        np.ones((10, 20)),
        dims=["x", "y"],
        coords={"x": np.arange(10), "y": np.arange(100, 120),},
        name="a",
    ).chunk({"x": 4, "y": 5})

map_da = make_da()
map_ds = xr.Dataset()
map_ds["a"] = make_da()
map_ds["c"] = map_ds.x + 20
map_ds = map_ds.chunk({"x": 4, "y": 5})
<xarray.Dataset>
Dimensions:  (x: 10, y: 20)
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9
  * y        (y) int64 100 101 102 103 104 105 106 ... 114 115 116 117 118 119
Data variables:
    a        (x, y) float64 dask.array<chunksize=(4, 5), meta=np.ndarray>
    c        (x) int64 dask.array<chunksize=(4,), meta=np.ndarray>

(map_ds + map_ds.y) gives y chunksize of 20 for c and 5 for a

(map_ds + map_ds.y)
<xarray.Dataset>
Dimensions:  (x: 10, y: 20)
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9
  * y        (y) int64 100 101 102 103 104 105 106 ... 114 115 116 117 118 119
Data variables:
    a        (x, y) float64 dask.array<chunksize=(4, 5), meta=np.ndarray>
    c        (x, y) int64 dask.array<chunksize=(4, 20), meta=np.ndarray>

This seems reasonable except (map_ds + map_ds.y).chunks) raises an "Inconsistent chunks" error.

I ran into this writing tests for map_blocks.

@shoyer
Copy link
Member

shoyer commented Sep 19, 2019

I think dask array has some utility functions for "unifying chunks" that we might be able to use inside our map_blocks() function.

Potentially we could also make Dataset.chunks more robust, e.g., have it return None for dimensions with inconsistent chunk sizes rather than raising an error.

Alternatively, we could enforce matching chunksizes on all dask arrays inside a Dataset, as part of xarray's model of a Dataset as a collection of aligned arrays. But this seems unnecessarily limiting, and I am reluctant to add extra complexity to xarray's data model.

@dcherian
Copy link
Contributor Author

I agree with not enforcing matching chunk sizes.

I've added an ugly version of Dataset.unify_chunks in #3276. Feedback welcome!

@dcherian dcherian mentioned this issue Sep 19, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants