-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_csv with names, usecols and parse_dates #9755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm this with master. With as simplified example:
But it seems the header/names do not matter for reproducing it. So this also reproduces it:
gives:
|
@brechtm Thanks for reporting! (interested in looking into it?) |
gfyoung
added a commit
to forking-repos/pandas
that referenced
this issue
Apr 6, 2016
Fixes bug in processing 'parse_dates' with the C engine in which the wrong indices (those of the filtered column names) were being used to determine the date columns to not be dtype-parsed by the C engine. The correct indices are those of the original (unfiltered) column names, as they are used later on in the actual data processing. Closes pandas-devgh-9755.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
xref #12203
The arrays passed to the date_parser function is different when
names
anduse_cols
are specified to limit the number of parsed columns.When running the example code below, the date_parser function receives two arguments, one array with '20140101' strings, and one array with integers. The default
date_parser
fails to process this input.When assigning an empty list to
DROPPED_COLUMNS
(so that all columns are parsed), the second array contains strings instead of integers, and the datetimes are parsed correctly.The problem doesn't occur with
engine='python'
. I haven't tested the influence of theheader
andindex_cols
options.Python script:
Contents of 2014.csv:
The text was updated successfully, but these errors were encountered: