Description
What happened?
I've been working on creating a ChunkManagerEntrypoint class for Arkouda to interoperate with XArray.
In experimenting with this, I've reimplemented this example, applying .chunk(chunked_array_type="arkouda")
both when creating the initial dataset:
ds = xr.tutorial.open_dataset("rasm").load().chunk(chunked_array_type="arkouda")
and doing the step to "calculate the weights by grouping by 'time.season'":
weights = (
month_length.groupby("time.season") / month_length.groupby("time.season").sum()
).chunk(chunked_array_type="arkouda")
My intention is for the remainder of the code to be exactly the same as the example.
For ds
, the chunking operation worked as I expected, where all the numerical arrays were converted to Arkouda Arrays, and the time array was left in its original (numpy array of objects) format:
However, when trying to do the same chunking operation with weights
(a DataArray), XArray attempts to create an Arkouda array from the time coordinate, resulting in an error.
As a workaround, I can manually create the DataArray, building an arkouda array from weights.data
, and copying the coords unchanged:
from arkouda import array_api as xp
weights = month_length.groupby("time.season") / month_length.groupby("time.season").sum()
weights = xr.DataArray(
xp.asarray(weights.data),
coords=weights.coords,
)
What did you expect to happen?
I'm not sure whether should be a bug or a feature request, but it would be an improvement for this use-case if DataArray.chunk
was able to leave object
coordinates in their original format instead of attempting to convert them to the chunked array type. I.e., essentially doing what the manual workaround above does.
Minimal Complete Verifiable Example
No response
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[6], line 16
6 weights
8 # # Test that the sum of the weights for each season is 1.0
9 # np.testing.assert_allclose(weights.groupby("time.season").sum().values, np.ones(4))
10
(...)
13 # # coords=weights.coords,
14 # # )
---> 16 weights = weights.chunk(chunked_array_type="arkouda")
18 # print(weights.data)
19
20 # ds_weighted = (ds * weightsAk).groupby("time.season").sum(dim="time")
21 # ds_weighted
File ~/anaconda3/envs/xarray-tests/lib/python3.10/site-packages/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
111 kwargs.update({name: arg for name, arg in zip_args})
113 return func(*args[:-n_extra_args], **kwargs)
--> 115 return func(*args, **kwargs)
File ~/anaconda3/envs/xarray-tests/lib/python3.10/site-packages/xarray/core/dataarray.py:1419, in DataArray.chunk(self, chunks, name_prefix, token, lock, inline_array, chunked_array_type, from_array_kwargs, **chunks_kwargs)
1416 else:
1417 chunks = either_dict_or_kwargs(chunks, chunks_kwargs, "chunk")
-> 1419 ds = self._to_temp_dataset().chunk(
1420 chunks,
1421 name_prefix=name_prefix,
1422 token=token,
1423 lock=lock,
1424 inline_array=inline_array,
1425 chunked_array_type=chunked_array_type,
1426 from_array_kwargs=from_array_kwargs,
1427 )
1428 return self._from_temp_dataset(ds)
File ~/anaconda3/envs/xarray-tests/lib/python3.10/site-packages/xarray/core/dataset.py:2733, in Dataset.chunk(self, chunks, name_prefix, token, lock, inline_array, chunked_array_type, from_array_kwargs, **chunks_kwargs)
2730 if from_array_kwargs is None:
2731 from_array_kwargs = {}
-> 2733 variables = {
2734 k: _maybe_chunk(
2735 k,
2736 v,
2737 chunks_mapping,
2738 token,
2739 lock,
2740 name_prefix,
2741 inline_array=inline_array,
2742 chunked_array_type=chunkmanager,
2743 from_array_kwargs=from_array_kwargs.copy(),
2744 )
2745 for k, v in self.variables.items()
2746 }
2747 return self._replace(variables)
File ~/anaconda3/envs/xarray-tests/lib/python3.10/site-packages/xarray/core/dataset.py:2734, in <dictcomp>(.0)
2730 if from_array_kwargs is None:
2731 from_array_kwargs = {}
2733 variables = {
-> 2734 k: _maybe_chunk(
2735 k,
2736 v,
2737 chunks_mapping,
2738 token,
2739 lock,
2740 name_prefix,
2741 inline_array=inline_array,
2742 chunked_array_type=chunkmanager,
2743 from_array_kwargs=from_array_kwargs.copy(),
2744 )
2745 for k, v in self.variables.items()
2746 }
2747 return self._replace(variables)
File ~/anaconda3/envs/xarray-tests/lib/python3.10/site-packages/xarray/core/dataset.py:321, in _maybe_chunk(name, var, chunks, token, lock, name_prefix, overwrite_encoded_chunks, inline_array, chunked_array_type, from_array_kwargs)
312 name2 = f"{name_prefix}{name}-{token2}"
314 from_array_kwargs = utils.consolidate_dask_from_array_kwargs(
315 from_array_kwargs,
316 name=name2,
317 lock=lock,
318 inline_array=inline_array,
319 )
--> 321 var = var.chunk(
322 chunks,
323 chunked_array_type=chunked_array_type,
324 from_array_kwargs=from_array_kwargs,
325 )
327 if overwrite_encoded_chunks and var.chunks is not None:
328 var.encoding["chunks"] = tuple(x[0] for x in var.chunks)
File ~/anaconda3/envs/xarray-tests/lib/python3.10/site-packages/xarray/core/variable.py:2598, in Variable.chunk(self, chunks, name, lock, inline_array, chunked_array_type, from_array_kwargs, **chunks_kwargs)
2590 # TODO deprecate passing these dask-specific arguments explicitly. In future just pass everything via from_array_kwargs
2591 _from_array_kwargs = consolidate_dask_from_array_kwargs(
2592 from_array_kwargs,
2593 name=name,
2594 lock=lock,
2595 inline_array=inline_array,
2596 )
-> 2598 return super().chunk(
2599 chunks=chunks,
2600 chunked_array_type=chunked_array_type,
2601 from_array_kwargs=_from_array_kwargs,
2602 **chunks_kwargs,
2603 )
File ~/anaconda3/envs/xarray-tests/lib/python3.10/site-packages/xarray/namedarray/core.py:821, in NamedArray.chunk(self, chunks, chunked_array_type, from_array_kwargs, **chunks_kwargs)
818 if is_dict_like(chunks):
819 chunks = tuple(chunks.get(n, s) for n, s in enumerate(ndata.shape)) # type: ignore[assignment]
--> 821 data_chunked = chunkmanager.from_array(ndata, chunks, **from_array_kwargs) # type: ignore[arg-type]
823 return self._replace(data=data_chunked)
File ~/.../arkouda_xarray/arkoudamanager.py:55, in ArkoudaManager.from_array(self, data, chunks, **kwargs)
53 print("first elem: ", data[0])
54 print("type of first elem: ", type(data[0]))
---> 55 arr = aa.zeros(data.shape, dtype=data.dtype)
57 # copy data into array one chunk at a time
58 n_chunks_per_dim = [data.shape[i] // chunk_sizes[i] for i in range(data.ndim)]
File ~/.../arkouda/array_api/_creation_functions.py:330, in zeros(shape, dtype, device)
327 dtype = akdtype(dtype) # normalize dtype
328 dtype_name = cast(np.dtype, dtype).name
--> 330 repMsg = generic_msg(
331 cmd=f"create{ndim}D",
332 args={
333 "dtype": dtype_name,
334 "shape": shape,
335 "value": 0,
336 },
337 )
339 return Array._new(create_pdarray(repMsg))
File ~/.../arkouda/client.py:1006, in generic_msg(cmd, args, payload, send_binary, recv_binary)
1004 else:
1005 assert payload is None
-> 1006 return cast(Channel, channel).send_string_message(
1007 cmd=cmd, args=msg_args, size=size, recv_binary=recv_binary
1008 )
1009 except KeyboardInterrupt as e:
1010 # if the user interrupts during command execution, the socket gets out
1011 # of sync reset the socket before raising the interrupt exception
1012 cast(Channel, channel).connect(timeout=0)
File ~/.../arkouda/client.py:529, in ZmqChannel.send_string_message(self, cmd, recv_binary, args, size, request_id)
527 # raise errors or warnings sent back from the server
528 if return_message.msgType == MessageType.ERROR:
--> 529 raise RuntimeError(return_message.msg)
530 elif return_message.msgType == MessageType.WARNING:
531 warnings.warn(return_message.msg)
RuntimeError: Error: server not configured to support 'undef' (in createMsg). Please update the configuration and recompile.
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.0
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.3.8
dask: 2024.5.0
distributed: 2024.5.0
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: None
sparse: 0.15.1
flox: 0.9.7
numpy_groupies: 0.11.1
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: 8.2.0
mypy: None
IPython: 8.24.0
sphinx: None