-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
xr.to_netcdf() alters time dimension #8542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
@lohae Could you please check that the time-units of the individual files are same for all files? |
@kmuehlbauer hmm difficult, unfortunately the files are stored in one netcdf for each 15min, opening every of the 170k+ files takes forever. But if that would be the reason, why does it look OK before I save it on disk? |
That's just a guess. You would just have to check some, where the times are corrupted. The guess is, that the unit's differ between files and that something breaks when encoding again with the unit's from the first file. If you can share the first file and one which gets corrupted that would also be an option to get behind this. |
Because the data is cf encoded on write. And obviously something happens during this step. The thing is, that each file can have different time units, they will be decoded correctly and look OK in the dataset. On write only the units from the first file survives and everything is encoded with that units. But normally for time units this should not be a problem. At least not for these time ranges here. There might be more CF encoding involved as the number your max value counts up to ( But without more information on your source data, we can only speculate. One more thing you might check is the output of Possible WorkaroundA possible workaround is to drop ds = ds.drop_encoding() This should create fresh time units on encode, fitting to your data. That's probably the same compared to your Update: This will remove all decoding information of every data variable and coordinate. This might not be wanted. So removing |
Thanks @kmuehlbauer , we are getting closer! I think the int overflow is what is happening as the encoding variable returns the following, the same for all 170k+ files, note that I replaced the source with ??, as it contained my login credentials.
As you pointed out that means the maximum value would be 02.Jan.2018 09:06:07 given the the max seconds after 01-01-2018 in int16. After saving and loading the netcdf calling
I still can't really figure out what exactly is happening in the int overflow situation and why this happens in the first place. But I also must say that I am mostly a data user and do not have any deeper knowledge about the internals of xarrays (or netcdf?) time format. Is it normal that time is stored in time unit since first time record of the ds? Also, removing the time encoding So, I have a solution/workaround now, however I believe this can be quite annoying as everything looks correct before saving but then is not. In my case I was producing some data for several sites in an automated way and only noticed afterwards. Maybe I should implement some extra checks on the time dimension in future endeavors. |
If there is a datetime64 array xarray will encode it. If there is no time units xarray will try to find some suiting time units by inspecting the array: Lines 420 to 440 in 2971994
|
@lohae Did you get along here, or is there something which should be addressed? |
@kmuehlbauer I think its fine, I used the workaround and I am aware of that in future! Thanks very much! |
@lohae Glad you can use the workaround. I'll close for now. Please reopen or open a follow-up issue if there is anything to do. |
…imedelta` (#8575) * Add proof of concept dask-friendly datetime encoding * Add dask support for timedelta encoding and more tests * Minor error message edits; add what's new entry * Add return type for new tests * Fix typo in what's new * Add what's new entry for update following #8542 * Add full type hints to encoding functions * Combine datetime64 and timedelta64 zarr tests; add cftime zarr test * Minor edits to what's new * Address initial review comments * Add proof of concept dask-friendly datetime encoding * Add dask support for timedelta encoding and more tests * Minor error message edits; add what's new entry * Add return type for new tests * Fix typo in what's new * Add what's new entry for update following #8542 * Add full type hints to encoding functions * Combine datetime64 and timedelta64 zarr tests; add cftime zarr test * Minor edits to what's new * Address initial review comments * Initial work toward addressing typing comments * Restore covariant=True in T_DuckArray; add type: ignores * Tweak netCDF3 error message * Move what's new entry * Remove extraneous text from merge in what's new * Remove unused type: ignore comment * Remove word from netCDF3 error message
What is your issue?
Hi!
I was downloading some data from single files (15min temporal resolution with some smaller gaps here and there) and wanted to save it for further processing. If I reopen the netcdf file, the time dimensions is distorted in a way I cannot really understand. Basically, it changes the 15min difference into something between the first timestamp (e.g. 2018-01-01 00:00:00) and something a few hours later. However, the timestamps are also unordered as the latest time seems to be somewhere in the middle.
my steps are basically a download script which is not really reproducible as it needs login tokens but afterwards everything is purely xarray:
Then, I simply call
ds.to_netcdf('filename.nc')
and when I re-open it withxr.open_dataset('filename.nc')
I get this funny data below where the ds.time.max() isarray('2018-01-01T09:06:04.000000000', dtype='datetime64[ns]')
withargmax=array(7383, dtype=int64)
, so it is not even increasing.Interestingly, saving it, opening it and assigning (
ds.assign_coords(time=correct_time)
) the correct time values from the ds previously saved and then saving it again seems to be a workaround but I would like to understand if it me missing something or it might be a bug? I had to re-download quite a lot of data due to this as I was not able to recover the correct time dimension from the altered dimension. If I open the corrupted one withdecode_times=False
it gives meseconds since 2018-01-01
with onlynp.unique(ds.time) = 16384
whereaslen(ds.time) = 174910
.Thanks in advance!
The text was updated successfully, but these errors were encountered: