Skip to content

astype not converting timedelta and datetime strings #22100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fjdiod opened this issue Jul 28, 2018 · 6 comments
Closed

astype not converting timedelta and datetime strings #22100

fjdiod opened this issue Jul 28, 2018 · 6 comments
Labels
Astype Bug Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type

Comments

@fjdiod
Copy link
Contributor

fjdiod commented Jul 28, 2018

Code Sample, a copy-pastable example if possible

>>> s = pd.Series(['0:0:1'])
>>> s.astype('timedelta64[ns]')
WTF timedelta64[ns]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sergey/projects/pandas/pandas/util/_decorators.py", line 177, in wrapper
    return func(*args, **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/generic.py", line 5182, in astype
    **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 554, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 421, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 564, in astype
    **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 655, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)
  File "/home/sergey/projects/pandas/pandas/core/dtypes/cast.py", line 717, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 455, in pandas._libs.lib.astype_intsafe
    util.set_value_at_unsafe(result, i, v)
  File "pandas/_libs/tslibs/util.pxd", line 145, in pandas._libs.tslibs.util.set_value_at_unsafe
    assign_value_1d(arr, i, value)
ValueError: Could not convert object to NumPy timedelta

Same with datetime.

Expected Output

>>> s.astype('timedelta64[ns]')
0   00:00:01
dtype: timedelta64[ns]

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 114f415
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+365.g114f41534.dirty
pytest: 3.6.2
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.28.3
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.7
IPython: 6.4.0
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.2.2
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.8
pymysql: 0.8.1
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.0

@azelk
Copy link

azelk commented Dec 9, 2019

The bug is still there. Please fix.

@jreback
Copy link
Contributor

jreback commented Dec 9, 2019

The bug is still there. Please fix.

hence the open status of this issue

@azelk you are welcome to volunteer a patch

@mroeschke mroeschke added Timedelta Timedelta data type and removed Datetime Datetime data dtype labels Apr 1, 2020
@jbrockmendel jbrockmendel added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 8, 2021
@nathanckuhn
Copy link

nathanckuhn commented Jul 13, 2023

I believe this is related, so I thought I'd share. (If distinct enough, please tell me & I can open a new issue.)

Summary: Below is a reproducible example similar to a project where I am trying to change large batches of Pandas DataFrame columns from 'string' to 'datetime64[ns]'. In doing so, I have gotten ValueErrors when astype() cannot convert the string date columns into date dtypes.

The issue seems arise when
(a) the string column is dtype 'string' not 'object' and
(b) the 'string' dtype column contains a mixture of string date formats.

The example below shows that to_datetime() is perfectly capable of handling both objects & 'string' dtypes, both with and without NA, and both with and without different date formats.

However, astype() cannot handle the (a)+(b) situation.
Why not? This seems like a bug.

Thanks for any thoughts!

Note: I'm running this in Visual Studio Code, hence the "#%%" interactive sections:

#%%
import pandas as pd
import numpy as np
import sys

print('Version Of Python: ' + sys.version)
print('Version Of Pandas: ' + pd.__version__)
print('Version Of Numpy: ' + np.version.version)

# Version Of Python: 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
# Version Of Pandas: 1.5.3
# Version Of Numpy: 1.24.2

#%%
df = pd.DataFrame({'date_obj': ['2000-01-02', '2015-02-01', np.nan, '03.04.2015', '04.03.2009', np.nan, '5/6/2000', '6/5/2019', np.nan, '2016-07-08 00:00:00', '2007-08-07 00:00:00']}) 
df['date_str'] = df['date_obj'].astype('string')
df['date_str_noNA'] = df['date_str'].fillna('2002-02-20')
df['date_str_same_format'] = pd.Series('2123-01-10').repeat(len(df)).reset_index(drop=True).astype('string')

#%%
df['date_obj_to_datetime'] = pd.to_datetime(df['date_obj'])
df['date_str_to_datetime'] = pd.to_datetime(df['date_str'])
df['date_str_noNA_to_datetime'] = pd.to_datetime(df['date_str_noNA'])
df['date_str_same_format_to_datetime'] = pd.to_datetime(df['date_str_same_format'])

#%%
df['date_obj_astype'] = df['date_obj'].astype('datetime64[ns]')
df['date_str_same_format_astype'] = df['date_str_same_format'].astype('datetime64[ns]')

#%%
df['date_str_noNA_astype'] = df['date_str_noNA'].astype('datetime64[ns]')
    ### ValueError: Error parsing datetime string "03.04.2015" at position 2.
    ### "03.04.2015" is the first row with a different pattern from the beginning.

#%%
df['date_str_astype'] = df['date_str'].astype('datetime64[ns]')
    ### ValueError: Could not convert object to NumPy datetime.

#%%
print(df.dtypes.to_string(), '\n')
print(df.to_string())

@MarcoGorelli
Copy link
Member

thanks for the report - you should use to_datetime and to_timedelta for such conversions

closing then

@nathanckuhn
Copy link

@MarcoGorelli , yes AND can "astype" be fixed to incorporate the improvements of "to_datetime"?
"astype" is a more generic function and should work just well as the more specific "to_datetime" function.
To me this is still a bug.
Thank you for your consideration!

@jreback
Copy link
Contributor

jreback commented Jan 27, 2024

-1 on adding to astype

the reason we have the to_ functions is to expose a number options for conversion

astype is quite simple and api and it should stay that way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astype Bug Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants