-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
to_datetime function not working with %Y.%m.%d %H format #21422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report - issue seems to be here - raising an error if we reach the end of the string and the hour wasn't 2 digits, which looks unnecessary.
PR to fix welcome - to start with I think could just delete this conditional and see if any tests break (as well as adding new tests) |
@chris-b1 deleting this conditional does solve @johan12345 's issue. But fails multiple tests. I will see if can modify those tests. |
BUG : to_datetime function not working with %Y.%m.%d %H format pandas-dev#21422
@jbrockmendel brings up a good point in questioning this in the associated PR. While I understand that the datetime module is parsing that single digit hour for the user, I'm not sure that can be guaranteed across all platforms. In fact, the Python documentation on "%H" from C89 states that it represents a "Hour (24-hour clock) as a zero-padded decimal number", which the example is not. I think if pandas tries to make guarantees about how this gets parsed for non-standard directives that we'd be opening up a can of worms, so I'd vote for no action here. Source: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior |
It appears in POSIX std, strptime will handle missing leading 0s for http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html I really only worry about ambigous cases, is there is any risk |
FWIW this would be breaking with both numpy and dateutil. |
we broke with numpy a while back, and its parser is limited so not worried about that, but the dateutil one is more interesting, didn't know that. from dateutil.parser import parser
parser().parse("2016.05.06 1")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-f30f9e318c90> in <module>()
----> 1 parser().parse("2016.05.06 1")
~\AppData\Local\Continuum\Anaconda3\envs\py36\lib\site-packages\dateutil\parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
557
558 if res is None:
--> 559 raise ValueError("Unknown string format")
560
561 if len(res) == 0:
ValueError: Unknown string format
parser().parse("2016.05.06 01")
Out[11]: datetime.datetime(2016, 5, 6, 1, 0) |
@chris-b1 good info. Any concerns with Windows? Not sure where to find that information |
Windows being windows doesn't actually have a |
To me it's only a question of whether we consider Even if the answer to that is no, we need to match python if an explicit format spec is passed, so some change is necessary either way. It's a bug for |
@chris-b1 Right - It is okay that the date is not parseable without specifying an explicit format, but when specifying the actual format, the behavior should be the same as with Thanks everyone for the quick reactions, even before I had time to look into it in detail myself! |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This'll be addressed in #50242 thanks for the report! |
Code Sample, a copy-pastable example if possible
Problem description
When using
to_datetime
to parse a date that only includes an hour component, but not minutes and seconds, with a format that is otherwise similar to ISO8601 (such as the format'%Y.%m.%d %H'
), aValueError
is raised (see above). This behavior is unexpected asstrptime
can parse the same date without any problem, using the same format string (see above).I suspect that the problem is in the
_format_is_iso
function ofpandas._libs.tslibs.parsing
, where it is just checked if the ISO format starts with the format given - so this format is recognized as being ISO-like. In this case, the format passed toto_datetime
is ignored andtslib.array_to_datetime
function is used to parse the date instead, which doesn't seem to be able to handle this kind of format.My current workaround is to modify the dates to also have a minutes component (append
':00'
to every string) so that they can be parsed.Expected Output
2012-01-01 01:00:00
(same as when usingstrptime
)Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.2
Cython: None
numpy: 1.14.2
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: