Skip to content

Mean called on groupby object adds dimensions to undesired variables #3398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewpauling opened this issue Oct 14, 2019 · 3 comments
Closed

Comments

@andrewpauling
Copy link
Contributor

MCVE Code Sample

import numpy as np
import xarray as xr
import cftime

# create time coordinate
tdays = np.arange(0, 730)
time = cftime.num2date(tdays, 'days since 0001-01-01 00:00:00',
                       calendar='noleap')

# create spatial coordinate
lev = np.arange(100)

# Create dummy data
x = np.random.rand(time.size, lev.size)
y = np.random.rand(lev.size)

# Create sample Dataset
ds = xr.Dataset({'sample_data': (['time', 'lev'],  x),
                 'independent_data': (['lev'], y)},
                coords={'time': (['time'], time),
                        'lev': (['lev'], lev)})

# Perform groupby and mean
ds2 = ds.groupby('time.month').mean(dim='time')

Actual Output

<xarray.Dataset>
Dimensions:           (lev: 100, month: 12)
Coordinates:
  * lev               (lev) int64 0 1 2 3 4 5 6 7 8 ... 92 93 94 95 96 97 98 99
  * month             (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sample_data       (month, lev) float64 0.5143 0.554 0.5027 ... 0.5246 0.5435
    independent_data  (month, lev) float64 0.01667 0.4687 ... 0.1015 0.7459

Expected Output

<xarray.Dataset>
Dimensions:           (lev: 100, month: 12)
Coordinates:
  * lev               (lev) int64 0 1 2 3 4 5 6 7 8 ... 92 93 94 95 96 97 98 99
  * month             (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sample_data       (month, lev) float64 0.5143 0.554 0.5027 ... 0.5246 0.5435
    independent_data  (lev) float64 0.01667 0.4687 ... 0.1015 0.7459

Problem Description

The variable independent_data above initially has no time dimension but, after performing groupby('time.month').mean(dim='time') on the Dataset, it now has a month dimension that is meaningless.
Preferably, it should leave the independent_data variable untouched.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 16:54:48) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2

xarray: 0.12.2
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.0
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.0.1
pip: 19.1.1
conda: None
pytest: None
IPython: 7.2.0
sphinx: 2.1.2

@jhamman
Copy link
Member

jhamman commented Oct 14, 2019

@andrewpauling - Can you confirm this is still an issue with xarray v0.14 (released today)?

@andrewpauling
Copy link
Contributor Author

@jhamman yes, I just updated to v0.14 and the issue is still present

@dcherian
Copy link
Contributor

Closing as dupe of #2145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants