-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
open_mfdataset cannot open multiple netcdf files written by NASA/GEOS MAPL v1.0.0 that contain data on a cubed-sphere grid #3286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is |
They appear to be the same: import xarray as xr
ds1 = xr.open_dataset('GCHP.SpeciesConc.20160716_1200z.nc4')
ds2 = xr.open_dataset('GCHP.AerosolMass.20160716_1200z.nc4')
print(ds1['anchor'].values - ds2['anchor'].values) Which gives:
I am not 100% sure what this "anchor" variable represents. It was added when NASA updated MAPL to v1.0.0. It seems to be something to do with the cubed-sphere coordinates. But for the purposes of plotting and analyzing the data we don't need it. If it helps...I also get this error if I just try to subtract the DataArrays from each other directly, instead of the numpy ndarray values: import xarray as xr
ds1 = xr.open_dataset('GCHP.SpeciesConc.20160716_1200z.nc4')
ds2 = xr.open_dataset('GCHP.AerosolMass.20160716_1200z.nc4')
dr = ds1['anchor'] - ds2['anchor']
print(dr) Which gives;
If two arrays are the same like this, is there a way to tell manually open_mfdataset not to broadcast them but to use the same values? |
It looks like anchor has a repeated dimension name. This is not well supported in xarray. See #1378. If you don't need it, then I think it's best to drop it. |
Thanks again. I will implement a workaround to drop it (probably a wrapper function that calls open_mfdataset). Good to know. |
You should be able to do what with the |
You can use the |
@jhamman Hahaha. |
Thanks, I'll check it out. Wasn't aware of drop_variables. |
Closing as a duplicate of #1378 |
According to xarray issues: pydata/xarray#3286 pydata/xarray#1378 The open_mfdataset function has problems in creating a merged dataset from multiple files in which variables have repeated dimension names. The easiest thing to do in this case is to prevent such variables from being read in. We now have added the drop_variables keyword to avoid reading in the "anchor" variable in all calls to open_dataset and open_mfdataset in both benchmark.py and core.py. This variable is only present in GCHP-created netCDF files using MAPL v1.0.0, which is in GCHP 12.5.0 and later. This commit should resolve GCPy issue #26: #26 Signed-off-by: Bob Yantosca <[email protected]>
MCVE Code Sample
First download these files:
Then run this code:
Expected Output
This should load data from both files into a single xarray Dataset object and print its contents.
Problem Description
Instead, this error occurs;
It seems to get hung up on trying to merge the "anchor" variable. As a workaround, if I drop the "anchor" variable from both datasets and then use xr.open_mfdataset, then the merge works properly.
Output of
xr.show_versions()
xarray: 0.12.1
pandas: 0.25.1
numpy: 1.16.4
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.6.2
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.2.0
pip: 19.2.2
conda: None
pytest: 4.2.0
IPython: 7.7.0
sphinx: 2.1.2
The text was updated successfully, but these errors were encountered: