Skip to content

BUG: not able to convert non-zero padded hour to timestamp. #46849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
RakeshJarupula opened this issue Apr 23, 2022 · 5 comments
Closed
3 tasks done

BUG: not able to convert non-zero padded hour to timestamp. #46849

RakeshJarupula opened this issue Apr 23, 2022 · 5 comments
Labels
Bug Datetime Datetime data dtype

Comments

@RakeshJarupula
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.to_datetime('2018-08-18 9', format='%Y-%m-%d %H', infer_datetime_format=True)

Issue Description

Not able to convert the non-zero padded Hour and zero padded date to timestamp.
ParserError: Unknown string format: 2018-08-18 9

Not having any issue converting:
pd.to_datetime('2018-08-18 09', format='%Y-%m-%d %H', infer_datetime_format=True)
OUTPUT: Timestamp('2018-08-18 09:00:00')

Expected Behavior

I am expecting Timestamp('2018-08-18 09:00:00') with non-zero padded hour.

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.8.13.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_India.1252

pandas : 1.4.2
numpy : 1.20.3
pytz : 2021.3
dateutil : 2.8.2
pip : 21.2.2
setuptools : 58.0.4
Cython : 0.29.25
pytest : 6.2.5
hypothesis : None
sphinx : 4.4.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.1
jinja2 : 2.11.3
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.4
brotli :
fastparquet : None
fsspec : 2022.02.0
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.5.0
numba : 0.54.1
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
snappy : None
sqlalchemy : 1.4.27
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None

@RakeshJarupula RakeshJarupula added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 23, 2022
@RakeshJarupula RakeshJarupula changed the title BUG: BUG: not able to convert non-zero padded hour to timestamp. Apr 23, 2022
@JosephParampathu
Copy link
Contributor

JosephParampathu commented May 2, 2022

I have been looking at this and just wanted to report my findings so far. Per the documentation for to_datetime, when infer_datetime_format is used, you should not pass a format as infer_datetime_format only guesses if the year or day is the first digit in the format.

That being said, the issue with passing a time where the hour does not have a padding zero is that the "format" argument uses strftime from datetime. You can see here that strftime requires hours to be passed with padded zeros (meaning 09, as opposed to 9). At that link if you read both the %H information and note 9 at the bottom, we can see that the padded zero is optional using strptime instead of strftime.

I was able to use the code below to reproduce what you are trying to do without the trailing zero, using strptime.

import pandas as pd
from datetime import datetime
print(pd.to_datetime(datetime.strptime("2018-8-18 9", "%Y-%m-%d %H"), infer_datetime_format=True))

While you may decide to use the code above for your use case, I am not sure if replacing strftime with strptime in pandas may be more useful at this time. Any input would be appreciated.

@simonjayhawkins simonjayhawkins added Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 26, 2022
@simonjayhawkins
Copy link
Member

Thanks @RakeshJarupula for the report and @JosephParampathu for the investigation.

if the infer_datetime_format=True is omitted, the traceback is...

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2084         try:
-> 2085             values, tz_parsed = conversion.datetime_to_datetime64(data)
   2086             # If tzaware, these values represent unix timestamps, so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_27674/4203207622.py in <module>
----> 1 pd.to_datetime("2018-08-18 9", format="%Y-%m-%d %H")

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    830             result = convert_listlike(arg, format)
    831     else:
--> 832         result = convert_listlike(np.array([arg]), format)[0]
    833 
    834     return result

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    463         assert format is None or infer_datetime_format
    464         utc = tz == "utc"
--> 465         result, tz_parsed = objects_to_datetime64ns(
    466             arg,
    467             dayfirst=dayfirst,

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2088             return values.view("i8"), tz_parsed
   2089         except (ValueError, TypeError):
-> 2090             raise e
   2091 
   2092     if tz_parsed is not None:

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2073 
   2074     try:
-> 2075         result, tz_parsed = tslib.array_to_datetime(
   2076             data,
   2077             errors=errors,

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

ValueError: time data 2018-08-18 9 doesn't match format specified

I think that the pandas cython parser should ideally match the stdlib behavior.

import datetime as dt

dt.datetime.strptime("2018-08-18 9", "%Y-%m-%d %H")  # datetime.datetime(2018, 8, 18, 9, 0)

but not sure whether resolving this would then also solve the issue in the OP.

@MarcoGorelli
Copy link
Member

This is the part of the code that'd need changing if you wanted to support this

/* The hours offset */
if (sublen >= 2 && isdigit(substr[0]) && isdigit(substr[1])) {
offset_hour = 10 * (substr[0] - '0') + (substr[1] - '0');
substr += 2;
sublen -= 2;
if (offset_hour >= 24) {
if (want_exc) {
PyErr_Format(PyExc_ValueError,
"Timezone hours offset out of range "
"in datetime string \"%s\"",
str);
}
goto error;
}
} else if (sublen >= 1 && isdigit(substr[0])) {
offset_hour = substr[0] - '0';
++substr;
--sublen;
} else {
goto parse_error;
}

In the meantime, I think it's fine to error

@MarcoGorelli
Copy link
Member

Actually, this can probably be handled as part of #50242

Thanks for the report, I'll add a test case and hopefully close it as part of that

@MarcoGorelli
Copy link
Member

also, looks like a dupe of #21422, so let's close in favour of that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

4 participants