Skip to content

String to datetime conversion with custom format #17167

@tritemio

Description

@tritemio

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import pandas as pd

df = pl.DataFrame({'dt': "2024-06-03 20:02:48.6800000"})
dt_format = "%Y-%m-%d %H:%M:%S%.6f0". # NOTE: this format has a trailing 0
df['dt'].str.to_datetime(dt_format, time_unit='ns')

Log output

Traceback (most recent call last):
  File "/Users/anto/src/poste/sda-poste-logistics/script/polars_bug_datetime.py", line 7, in <module>
    df["dt"].str.to_datetime(dt_format, time_unit="ns")
  File "/Users/anto/src/poste/sda-poste-logistics/venv/lib/python3.11/site-packages/polars/series/utils.py", line 107, in wrapper
    return s.to_frame().select_seq(f(*args, **kwargs)).to_series()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anto/src/poste/sda-poste-logistics/venv/lib/python3.11/site-packages/polars/dataframe/frame.py", line 8524, in select_seq
    return self.lazy().select_seq(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anto/src/poste/sda-poste-logistics/venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1909, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: conversion from `str` to `datetime[ns]` failed in column 'dt' for 1 out of 1 values: ["2024-06-03 20:02:48.6800000"]

You might want to try:
- setting `strict=False` to set values that cannot be converted to `null`
- using `str.strptime`, `str.to_date`, or `str.to_datetime` and providing a format string

Issue description

Converting from string to datetime with a format string should allow to decode custom formats.

In this example the input has 7 digits for fractional seconds after the decimal dot. However the last digit is always zero and should be ignored because there is a trailing 0 in the format string.

Instead, polars gives the above error during conversion.

Stripping the extra zero from the string before attempting the conversion works correctly in polars:

dt_format1 = "%Y-%m-%d %H:%M:%S%.6f"  # NOTE: no trailing 0 in the format
df['dt'].str.strip_suffix('0').str.to_datetime(dt_format1, time_unit='ns')

Pandas accepts the original format and convert the string correctly, as does python's datetime

x = pd.to_datetime(
    df["dt"].to_pandas(use_pyarrow_extension_array=True),
    format="%Y-%m-%d %H:%M:%S%.6f0",
)
df = df.with_columns(pl.from_pandas(x))

Expected behavior

The column should be converted to datetime without error, as done in pandas and datetime from python standard lib.

Installed versions

Details
--------Version info---------
Polars:               1.0.0-rc.2
Index type:           UInt32
Platform:             macOS-14.5-arm64-arm-64bit
Python:               3.11.3 (main, Sep  1 2023, 14:56:45) [Clang 14.0.3 (clang-1403.0.22.14.1)]

----Optional dependencies----
adbc_driver_manager:  1.0.0
cloudpickle:          3.0.0
connectorx:           0.3.3
deltalake:            <not installed>
fastexcel:            0.10.4
fsspec:               2023.12.2
gevent:               24.2.1
great_tables:         <not installed>
hvplot:               0.10.0
matplotlib:           3.8.4
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              16.1.0
pydantic:             2.7.3
pyiceberg:            0.6.1
sqlalchemy:           2.0.30
torch:                <not installed>
xlsx2csv:             0.8.2
xlsxwriter:           3.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-temporalArea: date/time functionalityP-lowPriority: lowbugSomething isn't workingpythonRelated to Python Polarsupstream issue

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions