Skip to content

Inconsistent behavior when using xr.concat() with pd.Index vs xr.DataArray as dim parameter for a new dimension #5992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
raj-magesh opened this issue Nov 17, 2021 · 3 comments
Labels

Comments

@raj-magesh
Copy link

raj-magesh commented Nov 17, 2021

summary

Using an xr.DataArray as the dim parameter to xr.concat does not name the new dimension with the name of the given DataArray (a generic dim_0 is used instead), though the coordinate information is captured correctly. This unexpected behavior does not happen when using a pd.Index as the dim parameter.

minimal reproducible example

import numpy as np
import pandas as pd
import xarray as xr

xarr1 = xr.DataArray(np.ones((2, 3)), dims=('existing_dim_0', 'existing_dim_1'))
xarr2 = xr.DataArray(np.ones((2, 3)), dims=('existing_dim_0', 'existing_dim_1'))

new_coord = ['xarr1', 'xarr2']

concatenating using pd.Index as dim parameter

xr.concat(
    [xarr1, xarr2],
    dim=pd.Index(new_coord, name='new_dim'),
)

Output: as expected, the new dimension has the name new_dim and the specified coordinate information

<xarray.DataArray (new_dim: 2, existing_dim_0: 2, existing_dim_1: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * new_dim  (new_dim) object 'xarr1' 'xarr2'
Dimensions without coordinates: existing_dim_0, existing_dim_1

concatenating using xr.DataArray as dim parameter

xr.concat(
    [xarr1, xarr2],
    dim=xr.DataArray(new_coord, name='new_dim'),
)

Output: inconsistent with the pd.Index method, the new dimension does NOT have the specified name new_dim, though it has the coordinate information

<xarray.DataArray (dim_0: 2, existing_dim_0: 2, existing_dim_1: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
    new_dim  (dim_0) <U5 'xarr1' 'xarr2'
Dimensions without coordinates: dim_0, existing_dim_0, existing_dim_1

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.31.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.20.1
pandas: 1.3.1
numpy: 1.21.1
scipy: 1.7.0
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: 3.3.0
Nio: None
zarr: None
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.09.0
distributed: None
matplotlib: 3.4.2
cartopy: None
seaborn: 0.11.1
numbagg: None
fsspec: 2021.08.1
cupy: None
pint: None
sparse: None
setuptools: 49.6.0.post20210108
pip: 21.3.1
conda: None
pytest: None
IPython: 7.24.1
sphinx: None

@raj-magesh raj-magesh changed the title Inconsistent behavior when using xr.concat() with pd.Index vs xd.DataArray as dim parameter for a new dimension Inconsistent behavior when using xr.concat() with pd.Index vs xr.DataArray as dim parameter for a new dimension Nov 23, 2021
@max-sixty max-sixty added the bug label Apr 7, 2022
@max-sixty
Copy link
Collaborator

Thanks @raj-magesh, you're correct. We would definitely take a pull request to fix this.

@dcherian
Copy link
Contributor

dcherian commented Apr 7, 2022

This looks like a duplicate of #1646

The design question is that we have two names from which to pick one: name of dataarray, and name of dimension in the DataArray.

@max-sixty
Copy link
Collaborator

Thanks @dcherian . Closing as duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants