CSV parse_dates not working if multiline header is specified. #8991

roldugin · 2014-12-04T02:06:36Z

I can't seem to be able to parse dates of multiheader files... Here's an example illustrating the problem I'm facing:

import pandas as pd
from io import StringIO

csv = """date,time,value
YYYY/MM/DD,HH:MM,Smp
2014/12/01,00:00,1
2014/12/01,01:00,2"""

pd.read_csv(StringIO(csv), header=[0,1], parse_dates=[['date','time']])
# ValueError: 'date' is not in list

jreback · 2014-12-04T02:14:04Z

This is a bit tricky. The problem is the column names are are really tuples.

e.g. this works but is ugly

In [22]: df = pd.read_csv(StringIO(csv), header=[0,1], parse_dates=[[('date','YYYY/MM/DD'),('time','HH:MM')]])

In [23]: df
Out[23]: 
  ('date', 'YYYY/MM/DD')_('time', 'HH:MM')  (value, Smp)
0                      2014-12-01 00:00:00             1
1                      2014-12-01 01:00:00             2

In [24]: df.dtypes
Out[24]: 
('date', 'YYYY/MM/DD')_('time', 'HH:MM')    datetime64[ns]
(value, Smp)                                         int64
dtype: object

This is a much better way to approach this problem (and much faster)


In [10]: df = pd.read_csv(StringIO(csv), header=[0,1])

In [11]: df
Out[11]: 
         date   time value
   YYYY/MM/DD  HH:MM   Smp
0  2014/12/01  00:00     1
1  2014/12/01  01:00     2

In [12]: pd.to_datetime(df.iloc[:,0] + ' ' + df.iloc[:,0])
Out[12]: 
0   2014-12-01 01:14:00
1   2014-12-01 01:14:00
Name: (date, YYYY/MM/DD), dtype: datetime64[ns]

so i'll call this an enhancement

roldugin · 2014-12-04T02:21:14Z

Thank you for the fast reply, @jreback and for clearing this up. I like the idea of doing it after CSV parsing.

phofl · 2021-11-12T11:32:23Z

The approach suggested by @jreback stopped working sometime in the past:

data = """a,b,c
1,2,3
5,6,7
4,,6.0"""

result = pd.read_csv(
    StringIO(data),
    engine="python",
    parse_dates=[("a", "1")],
    header=[0, 1],
)

This throws:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2021.2/scratches/scratch.py", line 16, in <module>
    result = pd.read_csv(
  File "/home/developer/PycharmProjects/pandas/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers/readers.py", line 639, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers/readers.py", line 535, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers/readers.py", line 888, in __init__
    self._engine = self._make_engine(self.engine)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers/readers.py", line 1139, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers/python_parser.py", line 154, in __init__
    self._validate_parse_dates_presence(self.columns)
  File "/home/developer/PycharmProjects/pandas/pandas/io/parsers/base_parser.py", line 280, in _validate_parse_dates_presence
    raise ValueError(
ValueError: Missing column provided to 'parse_dates': '1, a'

Process finished with exit code 1

Edit: Same behavior for c engine

jreback added IO CSV read_csv, to_csv Dtype Conversions Unexpected or buggy dtype conversions MultiIndex labels Dec 4, 2014

jreback added this to the 0.16.0 milestone Dec 4, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

mroeschke added Enhancement and removed Dtype Conversions Unexpected or buggy dtype conversions labels Apr 11, 2021

phofl added Bug and removed Enhancement labels Nov 12, 2021

phofl mentioned this issue Nov 12, 2021

BUG: read_csv raising if parse_dates is used with MultiIndex columns #44408

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 1.4 Nov 14, 2021

jreback closed this as completed in #44408 Nov 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CSV parse_dates not working if multiline header is specified. #8991

CSV parse_dates not working if multiline header is specified. #8991

roldugin commented Dec 4, 2014

jreback commented Dec 4, 2014

Uh oh!

roldugin commented Dec 4, 2014

Uh oh!

phofl commented Nov 12, 2021 •

edited

Loading

Uh oh!

Uh oh!

CSV parse_dates not working if multiline header is specified. #8991

CSV parse_dates not working if multiline header is specified. #8991

Comments

roldugin commented Dec 4, 2014

jreback commented Dec 4, 2014

Uh oh!

roldugin commented Dec 4, 2014

Uh oh!

phofl commented Nov 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phofl commented Nov 12, 2021 •

edited

Loading