-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix parse_dates processing with usecols and C engine #12512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bfb010b
to
a78ba29
Compare
58ca399
to
0f99251
Compare
|
||
def _set(x): | ||
if com.is_integer(x): | ||
self._reader.set_noconvert(x) | ||
if not identical: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems pretty duplicative here. pls try to make minimal code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
0fd57af
to
b0eef59
Compare
@jreback : Tests are passing once more, so this should be good to merge AFAICT. |
8965565
to
f96340e
Compare
9dd9566
to
90d6777
Compare
bcf7dbb
to
deb22fe
Compare
@jreback : Any updates with respect to my responses to your refactoring and check comments? |
|
||
|
||
|
||
- Bug in ``read_csv`` when specifying ``usecols`` and ``parse_dates`` simultaneously with the C engine (:issue:`9755`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add onto the end, put in-between other bug fixes (that way you won't have, nor cause conflicts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
6bba242
to
f74f81f
Compare
} | ||
expected = DataFrame(cols, columns=['c_d', 'a']) | ||
|
||
df = read_csv(StringIO(s), usecols=[0, 2, 3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are not testing this fully here. you MUST use self.read_csv
. In fact we should REMOVE read_csv
from the import as its just confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_csv
will default to the c-engine ONLY. while self.read_csv
cycles thru all of them (as separate tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, good catch. Let me go fix that.
f74f81f
to
f05d27a
Compare
@@ -1133,6 +1151,7 @@ def __init__(self, src, **kwds): | |||
|
|||
# XXX | |||
self.usecols = self._reader.usecols | |||
self._usecols_dtype = _validate_usecols_arg(self.usecols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you are actually using this variable? _usecols_dtype
, so no need to save it (I think we discussed using it, but not necessary i guess)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...doesn't look like it. Removed.
Fixes bug in processing 'parse_dates' with the C engine in which the wrong indices (those of the filtered column names) were being used to determine the date columns to not be dtype-parsed by the C engine. The correct indices are those of the original (unfiltered) column names, as they are used later on in the actual data processing. Closes pandas-devgh-9755.
f05d27a
to
551c9f1
Compare
@@ -211,6 +215,12 @@ Bug Fixes | |||
|
|||
- Bug in ``value_counts`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`) | |||
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`) | |||
|
|||
- Bug in ``read_csv`` when specifying ``names``, ```usecols``, and ``parse_dates`` simultaneously with the C engine (:issue:`9755`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine for now, but pls don't add extra lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
551c9f1
to
8e0489f
Compare
Enforces the fact that 'usecols' must either be all integers (indexing) or strings (column names), as mixtures of the two are ambiguous. Closes pandas-devgh-12678.
8e0489f
to
f0543a4
Compare
thanks @gfyoung |
@jreback : FYI, you can also cancel both of my builds for this PR on Travis. |
closes #9755
closes #12678
Continuing on my conquest of
read_csv
bugs, this PR fixes a bug brought up in #9755 in processingparse_dates
with the C engine in which the wrong indices (those of the filtered column names) were being used to determine the date columns to not be dtype-parsed by the C engine. The correct indices are those of the original column names, as they are used later on in the actual data processing.