Skip to content

DataArray.dt.seconds returns incorrect value for negative timedelta64[ns] #5937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leifdenby opened this issue Nov 4, 2021 · 4 comments · Fixed by #8435
Closed

DataArray.dt.seconds returns incorrect value for negative timedelta64[ns] #5937

leifdenby opened this issue Nov 4, 2021 · 4 comments · Fixed by #8435

Comments

@leifdenby
Copy link
Contributor

What happened:

For a negative timedelta64[ns] of 42 nanoseconds DataArray.dt.seconds returned a non-zero value (the returned value was 86399). When I pass in a positive 42 nanosecond timedelta64[ns] with the the TimeDeltaAccessor correctly returns zero. I would have expected both assertions in the example below to have passed, but the second fails. This seems to be a general issue with negative timedelta64[ns].

<xarray.DataArray 'seconds' (dim_0: 1)>
array([0])
Dimensions without coordinates: dim_0
<xarray.DataArray 'seconds' (dim_0: 1)>
array([86399])
Dimensions without coordinates: dim_0
Traceback (most recent call last):
  File "bug_dt_seconds.py", line 15, in <module>
    assert da.dt.seconds == 0
AssertionError

What you expected to happen:

<xarray.DataArray 'seconds' (dim_0: 1)>
array([0])
Dimensions without coordinates: dim_0
<xarray.DataArray 'seconds' (dim_0: 1)>
array([0])
Dimensions without coordinates: dim_0

Minimal Complete Verifiable Example:

# coding: utf-8

import xarray as xr
import numpy as np

# number of nanoseconds
value = 42

da = xr.DataArray([np.timedelta64(value, "ns")])
print(da.dt.seconds)
assert da.dt.seconds == 0

da = xr.DataArray([np.timedelta64(-value, "ns")])
print(da.dt.seconds)
assert da.dt.seconds == 0

Anything else we need to know?:

I've narrowed this down to the call to pd.Series(values.ravel()) in xarray.core.accessor_dt._access_through_series:

ipdb> pd.Series(values.ravel())
0   -1 days +23:59:59.999999958
dtype: timedelta64[ns]

I think the issue arises because pandas turns the numpy timedelta64 into a "minus one day plus a time". This actually does have a number of "seconds" in it, but the "total_seconds" has the expected value:

ipdb> pd.Series(values.ravel()).dt.total_seconds()
0   -4.200000e-08
dtype: float64

Which would correctly round to zero.

I don't think the issue is in pandas, although the output from pandas is counter-intuitive:

ipdb> pd.Series(values.ravel()).dt.seconds
0    86399
dtype: int64

Maybe we should handle this as a special case by taking the absolute value before passing the values to pandas (and then applying the original sign again afterwards)?

Environment:

Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.7 (default, May  6 2020, 04:59:01)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_GB.UTF-8
LANG: None
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.18.2
pandas: 1.3.4
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.4.2
pydap: installed
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.09.1
distributed: 2021.09.1
matplotlib: 3.2.2
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
fsspec: 2021.06.1
cupy: None
pint: 0.18
sparse: None
setuptools: 46.4.0.post20200518
pip: 21.1.2
conda: None
pytest: 6.0.1
IPython: 7.16.1
sphinx: None
@dcherian
Copy link
Contributor

dcherian commented Nov 8, 2021

Your proposal sounds good to me.

Would you mind raising an issue on the pandas bug tracker asking if this is expected behaviour?

@leifdenby
Copy link
Contributor Author

Your proposal sounds good to me.

Would you mind raising an issue on the pandas bug tracker asking if this is expected behaviour?

Great! Yes, I'm happy to raise it with the pandas developers.

@maresb
Copy link
Contributor

maresb commented Nov 9, 2023

This seems like expected behavior to me.

However, I find it confusing and problematic that DataArray.dt.total_seconds is missing. Would it make sense to open a PR to implement that?

@dcherian
Copy link
Contributor

dcherian commented Nov 9, 2023

Would it make sense to open a PR to implement that?

Yes please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants