Skip to content

Commit ef48c6f

Browse files
Merge pull request #9377 from cmeeren/patch-1
DOC: Clarify how date_parser is called (GH9376)
2 parents 9f439f0 + 8504456 commit ef48c6f

File tree

2 files changed

+32
-2
lines changed

2 files changed

+32
-2
lines changed

doc/source/io.rst

+26-1
Original file line numberDiff line numberDiff line change
@@ -563,7 +563,7 @@ writing to a file). For example:
563563
564564
Date Parsing Functions
565565
~~~~~~~~~~~~~~~~~~~~~~
566-
Finally, the parser allows you can specify a custom ``date_parser`` function to
566+
Finally, the parser allows you to specify a custom ``date_parser`` function to
567567
take full advantage of the flexibility of the date parsing API:
568568

569569
.. ipython:: python
@@ -573,6 +573,31 @@ take full advantage of the flexibility of the date parsing API:
573573
date_parser=conv.parse_date_time)
574574
df
575575
576+
Pandas will try to call the ``date_parser`` function in three different ways. If
577+
an exception is raised, the next one is tried:
578+
579+
1. ``date_parser`` is first called with one or more arrays as arguments,
580+
as defined using `parse_dates` (e.g., ``date_parser(['2013', '2013'], ['1', '2'])``)
581+
582+
2. If #1 fails, ``date_parser`` is called with all the columns
583+
concatenated row-wise into a single array (e.g., ``date_parser(['2013 1', '2013 2'])``)
584+
585+
3. If #2 fails, ``date_parser`` is called once for every row with one or more
586+
string arguments from the columns indicated with `parse_dates`
587+
(e.g., ``date_parser('2013', '1')`` for the first row, ``date_parser('2013', '2')``
588+
for the second, etc.)
589+
590+
Note that performance-wise, you should try these methods of parsing dates in order:
591+
592+
1. Try to infer the format using ``infer_datetime_format=True`` (see section below)
593+
594+
2. If you know the format, use ``pd.to_datetime()``:
595+
``date_parser=lambda x: pd.to_datetime(x, format=...)``
596+
597+
3. If you have a really non-standard format, use a custom ``date_parser`` function.
598+
For optimal performance, this should be vectorized, i.e., it should accept arrays
599+
as arguments.
600+
576601
You can explore the date parsing functionality in ``date_converters.py`` and
577602
add your own. We would love to turn this module into a community supported set
578603
of date/time parsers. To get you started, ``date_converters.py`` contains

pandas/io/parsers.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,12 @@ class ParserWarning(Warning):
104104
date_parser : function
105105
Function to use for converting a sequence of string columns to an
106106
array of datetime instances. The default uses dateutil.parser.parser
107-
to do the conversion.
107+
to do the conversion. Pandas will try to call date_parser in three different
108+
ways, advancing to the next if an exception occurs: 1) Pass one or more arrays
109+
(as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string
110+
values from the columns defined by parse_dates into a single array and pass
111+
that; and 3) call date_parser once for each row using one or more strings
112+
(corresponding to the columns defined by parse_dates) as arguments.
108113
dayfirst : boolean, default False
109114
DD/MM format dates, international and European format
110115
thousands : str, default None

0 commit comments

Comments
 (0)