-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: not able to convert non-zero padded hour to timestamp. #46849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have been looking at this and just wanted to report my findings so far. Per the documentation for to_datetime, when infer_datetime_format is used, you should not pass a format as infer_datetime_format only guesses if the year or day is the first digit in the format. That being said, the issue with passing a time where the hour does not have a padding zero is that the "format" argument uses strftime from datetime. You can see here that strftime requires hours to be passed with padded zeros (meaning 09, as opposed to 9). At that link if you read both the %H information and note 9 at the bottom, we can see that the padded zero is optional using strptime instead of strftime. I was able to use the code below to reproduce what you are trying to do without the trailing zero, using strptime. import pandas as pd
from datetime import datetime
print(pd.to_datetime(datetime.strptime("2018-8-18 9", "%Y-%m-%d %H"), infer_datetime_format=True)) While you may decide to use the code above for your use case, I am not sure if replacing strftime with strptime in pandas may be more useful at this time. Any input would be appreciated. |
Thanks @RakeshJarupula for the report and @JosephParampathu for the investigation. if the ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2084 try:
-> 2085 values, tz_parsed = conversion.datetime_to_datetime64(data)
2086 # If tzaware, these values represent unix timestamps, so we
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
/tmp/ipykernel_27674/4203207622.py in <module>
----> 1 pd.to_datetime("2018-08-18 9", format="%Y-%m-%d %H")
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
830 result = convert_listlike(arg, format)
831 else:
--> 832 result = convert_listlike(np.array([arg]), format)[0]
833
834 return result
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
463 assert format is None or infer_datetime_format
464 utc = tz == "utc"
--> 465 result, tz_parsed = objects_to_datetime64ns(
466 arg,
467 dayfirst=dayfirst,
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2088 return values.view("i8"), tz_parsed
2089 except (ValueError, TypeError):
-> 2090 raise e
2091
2092 if tz_parsed is not None:
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2073
2074 try:
-> 2075 result, tz_parsed = tslib.array_to_datetime(
2076 data,
2077 errors=errors,
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
ValueError: time data 2018-08-18 9 doesn't match format specified I think that the pandas cython parser should ideally match the stdlib behavior.
but not sure whether resolving this would then also solve the issue in the OP. |
This is the part of the code that'd need changing if you wanted to support this pandas/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c Lines 581 to 601 in a37b78d
In the meantime, I think it's fine to error |
Actually, this can probably be handled as part of #50242 Thanks for the report, I'll add a test case and hopefully close it as part of that |
also, looks like a dupe of #21422, so let's close in favour of that |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Not able to convert the non-zero padded Hour and zero padded date to timestamp.
ParserError: Unknown string format: 2018-08-18 9
Not having any issue converting:
pd.to_datetime('2018-08-18 09', format='%Y-%m-%d %H', infer_datetime_format=True)
OUTPUT:
Timestamp('2018-08-18 09:00:00')
Expected Behavior
I am expecting
Timestamp('2018-08-18 09:00:00')
with non-zero padded hour.Installed Versions
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.8.13.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_India.1252
pandas : 1.4.2
numpy : 1.20.3
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.2
setuptools : 58.0.4
Cython : 0.29.25
pytest : 6.2.5
hypothesis : None
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.1
jinja2 : 2.11.3
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.4
brotli :
fastparquet : None
fsspec : 2022.02.0
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.5.0
numba : 0.54.1
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
snappy : None
sqlalchemy : 1.4.27
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None
The text was updated successfully, but these errors were encountered: