-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
“ValueError: chunksize cannot exceed dimension size” when trying to write xarray to netcdf #1225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've also just encountered this. Will try to to reproduce a self-contained example. |
I've been encountering this as well, and I don't want to use the scipy engine workaround. If you can tell me what a "self-contained" example means, I can also try to provide one. |
@tbohn "self-contained" just means something that I can run on my machine. For example, the code above plus the "somefile.nc" netCDF file that I can load to reproduce this example. Thinking about this a little more, I think the issue is somehow related to the
The bug is somewhere in our handling of chunksize encoding for netCDF4, but it is difficult to fix it without being able to run code that reproduces it. |
OK, here's my code and the file that it works (fails) on. Code: import os.path
import numpy as np
import xarray as xr
ds = xr.open_dataset('veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc')
ds_out = ds.isel(lat=slice(0,16),lon=slice(0,16))
#ds_out.encoding['unlimited_dims'] = 'time'
ds_out.to_netcdf('test.out.nc') Note that I commented out the attempt to make 'time' unlimited - if I attempt it, I get a slightly different chunk size error ('NetCDF: Bad chunk sizes'). I realize that for now I can use 'ncks' as a workaround, but seems to me that xarray should be able to do this too. File (attached) |
(note also that for the example nc file I provided, the slice that my example code makes contains nothing but null values - but that's irrelevant - the error happens for other slices that do contain non-null values.) |
@tbohn - What is happening here is that xarray is storing the netCDF4 chunk size from the input file. For the $ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc
netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates {
dimensions:
veg_class = 19 ;
lat = 160 ;
lon = 160 ;
time = UNLIMITED ; // (5 currently)
variables:
float Cv(veg_class, lat, lon) ;
Cv:_FillValue = -1.f ;
Cv:units = "-" ;
Cv:longname = "Area Fraction" ;
Cv:missing_value = -1.f ;
Cv:_Storage = "contiguous" ;
Cv:_Endianness = "little" ;
float LAI(veg_class, time, lat, lon) ;
LAI:_FillValue = -1.f ;
LAI:units = "m2/m2" ;
LAI:longname = "Leaf Area Index" ;
LAI:missing_value = -1.f ;
LAI:_Storage = "chunked" ;
LAI:_ChunkSizes = 19, 1, 160, 160 ;
LAI:_Endianness = "little" ;
... Those integers correspond to the dimensions from LAI. When you slice your dataset, you end up with lat/lon dimensions that are now smaller than the The logical fix is to validate this encoding attribute and either 1) throw an informative error if something isn't going to work, or 2) change the |
OK, thanks Joe and Stephan.
…On Wed, Aug 30, 2017 at 3:36 PM, Joe Hamman ***@***.***> wrote:
@tbohn <https://github.com/tbohn> - What is happening here is that xarray
is storing the netCDF4 chunk size from the input file. For the LAI
variable in your example, that isLAI:_ChunkSizes = 19, 1, 160, 160 ; (you
can see this with ncdump -h -s filename.nc).
$ ncdump -s -h veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates.nc
netcdf veg_hist.0_10n.90_80w.2000_2016.mode_PFT.5dates {
dimensions:
veg_class = 19 ;
lat = 160 ;
lon = 160 ;
time = UNLIMITED ; // (5 currently)
variables:
float Cv(veg_class, lat, lon) ;
Cv:_FillValue = -1.f ;
Cv:units = "-" ;
Cv:longname = "Area Fraction" ;
Cv:missing_value = -1.f ;
Cv:_Storage = "contiguous" ;
Cv:_Endianness = "little" ;
float LAI(veg_class, time, lat, lon) ;
LAI:_FillValue = -1.f ;
LAI:units = "m2/m2" ;
LAI:longname = "Leaf Area Index" ;
LAI:missing_value = -1.f ;
LAI:_Storage = "chunked" ;
LAI:_ChunkSizes = 19, 1, 160, 160 ;
LAI:_Endianness = "little" ;
...
Those integers correspond to the dimensions from LAI. When you slice your
dataset, you end up with lat/lon dimensions that are now smaller than the
_ChunkSizes. When writing this back to netCDF, xarray is still trying to
use the original encoding attribute.
The logical fix is to validate this encoding attribute and either 1) throw
an informative error if something isn't going to work, or 2) change the
ChunkSizes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1225 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADVZeo0qPYlMc_a8UeGDNp04jtFXqkgOks5sdePhgaJpZM4Ls47i>
.
|
Is there any news on this? Have the same problem. A reset_chunksizes() method would be very helpful. Also, what is the cleanest way to remove all chunk size info? I have a very long computation and it fails at the very end with the mentioned error message. My file is patched together from many sources... cheers |
@ChrWerner Sorry to hear about your trouble, I will take another look at this. Right now, your best bet is probably something like: def clean_dataset(ds):
for var in ds.variables.values():
if 'chunksizes' in var.encoding:
del var.encoding['chunksizes'] |
Thanks for that Stephan. The workaround looks good for the moment ;-)... cheers, |
Doing some digging, it turns out this turned up quite a while ago back in #156 where we added some code to fix this. Looking at @tbohn's dataset, the problem variable is actually the coordinate variable
For some reason netCDF4 gives it a chunking of 2 ** 20, even though it only has length 5. This leads to an error when we write a file back with the original chunking. |
Reported on StackOverflow: http://stackoverflow.com/questions/39900011/valueerror-chunksize-cannot-exceed-dimension-size-when-trying-to-write-xarray
Unfortunately, the given example is not self-contained:
Apparently this works if
engine='scipy'
into_netcdf
!Something strange is definitely going on, I suspect a bug.
The text was updated successfully, but these errors were encountered: