Skip to content

Re-indexing causes coordinates to be dropped #3438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Hoeze opened this issue Oct 23, 2019 · 2 comments · Fixed by #3645
Closed

Re-indexing causes coordinates to be dropped #3438

Hoeze opened this issue Oct 23, 2019 · 2 comments · Fixed by #3645

Comments

@Hoeze
Copy link

Hoeze commented Oct 23, 2019

Hi, I encounter a problem with the index being dropped when I rename a dimension and stack it afterwards:

MCVE Code Sample

ds = xr.Dataset({
    "test": xr.DataArray(
        [[[1,2],[3,4]], [[1,2],[3,4]]], 
        dims=("genes", "observations", "subtissues"), 
        coords={
            "observations": xr.DataArray(["x-1", "y-1"], dims=("observations",)),
            "individuals": xr.DataArray(["x", "y"], dims=("observations",)), 
            "genes": xr.DataArray(["a", "b"], dims=("genes",)), 
            "subtissues": xr.DataArray(["c", "d"], dims=("subtissues",)),
        }
    )
})

individuals is set here:

print(ds.rename_dims(observations="individuals"))
<xarray.Dataset>
Dimensions:       (genes: 2, individuals: 2, subtissues: 2)
Coordinates:
    observations  (individuals) <U3 'x-1' 'y-1'
  * individuals   (individuals) <U1 'x' 'y'
  * genes         (genes) <U1 'a' 'b'
  * subtissues    (subtissues) <U1 'c' 'd'
Data variables:
    test          (genes, individuals, subtissues) int64 1 2 3 4 1 2 3 4

Stacking caused individuals to disappear and be replaced with integers:

print(ds.rename_dims(observations="individuals").stack(observations=["individuals", "genes"]))
<xarray.Dataset>
Dimensions:       (observations: 4, subtissues: 2)
Coordinates:
  * observations  (observations) MultiIndex
  - individuals   (observations) int64 0 0 1 1
  - genes         (observations) object 'a' 'b' 'a' 'b'
  * subtissues    (subtissues) <U1 'c' 'd'
Data variables:
    test          (subtissues, observations) int64 1 1 3 3 2 2 4 4

Explicitly setting individuals keeps them correctly after stacking:

print(ds.rename_dims(observations="individuals").set_index({"individuals": "individuals"}).set_coords("individuals").stack(observations=["individuals", "genes"]))
<xarray.Dataset>
Dimensions:       (observations: 4, subtissues: 2)
Coordinates:
  * observations  (observations) MultiIndex
  - individuals   (observations) object 'x' 'x' 'y' 'y'
  - genes         (observations) object 'a' 'b' 'a' 'b'
  * subtissues    (subtissues) <U1 'c' 'd'
Data variables:
    test          (subtissues, observations) int64 1 1 3 3 2 2 4 4

Is this by intention?

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.10.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.1

xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: None
pytest: 5.0.1
IPython: 7.8.0
sphinx: None

@keewis
Copy link
Collaborator

keewis commented Oct 23, 2019

I don't know whether that is intended or not, but I think that the operation of replacing observations with individuals is better done with swap_dims:

>>> xr.__version__
'0.14.0'
>>> ds.rename_dims(observations="individuals").stack(observations=["individuals", "genes"])
<xarray.Dataset>
Dimensions:       (observations: 4, subtissues: 2)
Coordinates:
  * observations  (observations) MultiIndex
  - individuals   (observations) int64 0 0 1 1
  - genes         (observations) object 'a' 'b' 'a' 'b'
  * subtissues    (subtissues) <U1 'c' 'd'
Data variables:
    test          (subtissues, observations) int64 1 1 3 3 2 2 4 4
>>> ds.swap_dims({"observations": "individuals"}).stack(observations=["individuals", "genes"])
<xarray.Dataset>
Dimensions:       (observations: 4, subtissues: 2)
Coordinates:
  * observations  (observations) MultiIndex
  - individuals   (observations) object 'x' 'x' 'y' 'y'
  - genes         (observations) object 'a' 'b' 'a' 'b'
  * subtissues    (subtissues) <U1 'c' 'd'
Data variables:
    test          (subtissues, observations) int64 1 1 3 3 2 2 4 4

Indeed it seems that on 0.13.0 using rename_dims worked, so that would be a regression. However, I would argue that rename_dims should raise an error if a new dimension already exists (and maybe point to swap_dims).

@shoyer
Copy link
Member

shoyer commented Dec 19, 2019

However, I would argue that rename_dims should raise an error if a new dimension already exists (and maybe point to swap_dims).

+1 this sounds like a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants