-
Notifications
You must be signed in to change notification settings - Fork 16
[Bug]: Possible inconsistency with xarray when spanning across files #394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gleckler1 - is it possible to point to some publicly available data to look into this? I don't think this can be resolved without more information (or data to investigate the issue). |
Original comment below. Example data can be downloaded from: https://data.remss.com/smap/SSS/V05.0/FINAL/L3/monthly/ XCDAT Issue with open_mfdatasetThis document was written to briefly illustrate an issue with IssueThis problem occurs when using the xarray.open_mfdatasetIn the code being used,
Where
Note that each netCDF4 file in Several keywords are set in This call appropriately reads in all monthly average gridded xcdat.open_mfdatasetWhen the same call to
The following error is returned:
In this particular error,
In this error, it is referring to the default value of the ConcludingIt is possible that there is a sequence of keywords using |
I think this issue arises because I think this issue can be resolved with:
I think specifying |
@pochedls Thank you for summarizing this issue and troubleshooting. |
With data_vars='minimal', for instance if the |
Hi @pochedls and @chengzhuzhang, here's what I came up with based on your comments. Suspected Root CauseAs Jill pointed out in #143, There is a closed xarray issue that talks about adding xarray docs on
Example: import os
import xcdat as xc
dir = "/p/user_pub/PCMDIobs/PCMDIobs2/atmos/3hr/pr/TRMM-3B43v-7/gn/v20200707"
file_pattern = "pr_3hr_TRMM-3B43v-7_BE_gn_v20200707_1998*.nc"
files = os.path.join(dir, file_pattern)
# `data_vars="minimal"` (xcdat default)
# ------------------------------------------------------------
ds1 = xc.open_mfdataset(files, data_vars="minimal")
# Notice how "lat_bnds" retains its original dimensions (no "time" dimension)
print(ds1.lat_bnds)
"""
<xarray.DataArray 'lat_bnds' (lat: 400, bnds: 2)>
dask.array<where, shape=(400, 2), dtype=float64, chunksize=(400, 2), chunktype=numpy.ndarray>
Coordinates:
* lat (lat) float64 -49.88 -49.62 -49.38 -49.12 ... 49.38 49.62 49.88
Dimensions without coordinates: bnds
"""
# Spatial averaging works as expected
pr_global1 = ds1.spatial.average("pr", axis=["X", "Y"])
# `data_vars="all"` (xarray default)
# ------------------------------------------------------------
ds2 = xc.open_mfdataset(files, data_vars="all")
# Notice how "lat_bnds" now has a "time" dimension (unexpected)
print(ds2.lat_bnds)
"""
<xarray.DataArray 'lat_bnds' (time: 2920, lat: 400, bnds: 2)>
dask.array<concatenate, shape=(2920, 400, 2), dtype=float64, chunksize=(248, 400, 2), chunktype=numpy.ndarray>
Coordinates:
* time (time) object 1998-01-01 00:00:00 ... 1998-12-31 21:00:00
* lat (lat) float64 -49.88 -49.62 -49.38 -49.12 ... 49.38 49.62 49.88
Dimensions without coordinates: bnds
"""
# This results in the error below with spatial averaging
pr_global2 = ds2.spatial.average("pr", axis=["X", "Y"])
"""
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/home/vo13/xCDAT/xcdat/qa/issue-394-data-vars/qa.py in line 5
15 #%%
16
17 # `data_vars="all"` (xarray default)
18 ds2 = xc.open_mfdataset(files, data_vars="all")
----> 19 pr_global2 = ds2.spatial.average("pr", axis=["X", "Y"])
File ~/xCDAT/xcdat/xcdat/spatial.py:195, in SpatialAccessor.average(self, data_var, axis, weights, keep_weights, lat_bounds, lon_bounds)
192 elif isinstance(weights, xr.DataArray):
193 self._weights = weights
--> 195 self._validate_weights(dv, axis)
196 ds[dv.name] = self._averager(dv, axis)
198 if keep_weights:
File ~/xCDAT/xcdat/xcdat/spatial.py:707, in SpatialAccessor._validate_weights(self, data_var, axis)
705 dim_name = get_dim_keys(data_var, key)
706 if dim_name not in self._weights.dims:
--> 707 raise KeyError(
708 f"The weights DataArray does not include an {key} axis, or the "
709 "dimension names are not the same."
710 )
712 # Check the weight dim sizes equal data var dim sizes.
713 dim_sizes = {key: data_var.sizes[key] for key in self._weights.sizes.keys()}
KeyError: 'The weights DataArray does not include an X axis, or the dimension names are not the same.'
""" Potential Solution(s)In this specific issue, it sounds there are cases where we do want to concatenate on dimensions that are not in the original data var (e.g., With this knowledge, we can either:
I'll think about this more, but do either you have any thoughts here? |
I think we should keep this as is (default |
I hit the same issue and curious if we did come up with some conclusion. I am posting my issue and my workarounds for the record. import glob
import xcdat as xc
ncfiles = glob.glob("/p/css03/esgf_publish/CMIP6/CMIP/NCC/NorESM2-LM/piControl/r1i1p1f1/Amon/ts/gn/v20210118/*.nc")
ds = xc.open_mfdataset(ncfiles)
Workaround attempt 1Used ds = xc.open_mfdataset(ncfiles, compat='override') or ds = xc.open_mfdataset(ncfiles, compat='override', join="override") returns the following error
Workaround attempt 2Used ds = xc.open_mfdataset(ncfiles, data_vars="minimum") returns
@tomvothecoder @pochedls @chengzhuzhang Do you happen to know whether Workaround 3 (succeed)ds = xc.open_mfdataset(ncfiles, drop_variables="lon_bnds").bounds.add_missing_bounds(axes=["X"]) |
I don't think Otherwise, you can try
The value should be
I think this is the best option forward if the issue is related to |
Uh oh!
There was an error while loading. Please reload this page.
What happened?
See attached markdown file prepared by Andrew Manaser (RSS).
xcdat_open_mfdataset_issue_README.md
What did you expect to happen?
This problem occurs when using the
open_mfdataset
method withxcdat
(occasionalloy calledxc
in this document) to concatenate multiple files of monthly average griddedlat
/lon
geophysical netCDF4 data along a new axis (in this casetime
)Minimal Complete Verifiable Example
No response
Relevant log output
No response
Anything else we need to know?
No response
Environment
Should be included on attached file.
The text was updated successfully, but these errors were encountered: