BUG: Fix parse_dates processing with usecols and C engine #12512

gfyoung · 2016-03-02T15:14:52Z

Continuing on my conquest of read_csv bugs, this PR fixes a bug brought up in #9755 in processing parse_dates with the C engine in which the wrong indices (those of the filtered column names) were being used to determine the date columns to not be dtype-parsed by the C engine. The correct indices are those of the original column names, as they are used later on in the actual data processing.

jreback · 2016-03-06T15:23:38Z

pandas/io/parsers.py


        def _set(x):
-            if com.is_integer(x):
-                self._reader.set_noconvert(x)
+            if not identical:


seems pretty duplicative here. pls try to make minimal code changes.

gfyoung · 2016-03-08T09:57:32Z

@jreback : Tests are passing once more, so this should be good to merge AFAICT.

gfyoung · 2016-04-05T19:18:39Z

@jreback : Any updates with respect to my responses to your refactoring and check comments?

jreback · 2016-04-05T20:20:58Z

doc/source/whatsnew/v0.18.1.txt

+
+
+
+- Bug in ``read_csv`` when specifying ``usecols`` and ``parse_dates`` simultaneously with the C engine (:issue:`9755`)


don't add onto the end, put in-between other bug fixes (that way you won't have, nor cause conflicts)

jreback · 2016-04-06T01:48:06Z

pandas/io/tests/test_parsers.py

+        }
+        expected = DataFrame(cols, columns=['c_d', 'a'])
+
+        df = read_csv(StringIO(s), usecols=[0, 2, 3],


you are not testing this fully here. you MUST use self.read_csv. In fact we should REMOVE read_csv from the import as its just confusing.

read_csv will default to the c-engine ONLY. while self.read_csv cycles thru all of them (as separate tests)

Whoops, good catch. Let me go fix that.

jreback · 2016-04-06T13:16:16Z

pandas/io/parsers.py

@@ -1133,6 +1151,7 @@ def __init__(self, src, **kwds):

        # XXX
        self.usecols = self._reader.usecols
+        self._usecols_dtype = _validate_usecols_arg(self.usecols)


I don't think you are actually using this variable? _usecols_dtype, so no need to save it (I think we discussed using it, but not necessary i guess)

Yeah...doesn't look like it. Removed.

Fixes bug in processing 'parse_dates' with the C engine in which the wrong indices (those of the filtered column names) were being used to determine the date columns to not be dtype-parsed by the C engine. The correct indices are those of the original (unfiltered) column names, as they are used later on in the actual data processing. Closes pandas-devgh-9755.

jreback · 2016-04-06T16:17:00Z

doc/source/whatsnew/v0.18.1.txt

@@ -211,6 +215,12 @@ Bug Fixes

 - Bug in ``value_counts`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
 - Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
+
+- Bug in ``read_csv`` when specifying ``names``, ```usecols``, and ``parse_dates`` simultaneously with the C engine (:issue:`9755`)
+


fine for now, but pls don't add extra lines

Enforces the fact that 'usecols' must either be all integers (indexing) or strings (column names), as mixtures of the two are ambiguous. Closes pandas-devgh-12678.

jreback · 2016-04-06T19:17:51Z

thanks @gfyoung

gfyoung · 2016-04-06T19:29:39Z

@jreback : FYI, you can also cancel both of my builds for this PR on Travis.

gfyoung force-pushed the parse_dates_usecols branch 2 times, most recently from bfb010b to a78ba29 Compare March 3, 2016 13:36

jreback added Bug IO CSV read_csv, to_csv labels Mar 3, 2016

gfyoung force-pushed the parse_dates_usecols branch 2 times, most recently from 58ca399 to 0f99251 Compare March 6, 2016 02:32

jreback reviewed Mar 6, 2016
View reviewed changes

gfyoung force-pushed the parse_dates_usecols branch 3 times, most recently from 0fd57af to b0eef59 Compare March 8, 2016 06:27

jreback added this to the 0.18.1 milestone Mar 8, 2016

gfyoung force-pushed the parse_dates_usecols branch 15 times, most recently from 8965565 to f96340e Compare March 14, 2016 21:05

gfyoung force-pushed the parse_dates_usecols branch 2 times, most recently from 9dd9566 to 90d6777 Compare March 16, 2016 15:27

gfyoung force-pushed the parse_dates_usecols branch 2 times, most recently from bcf7dbb to deb22fe Compare April 5, 2016 18:32

jreback reviewed Apr 5, 2016
View reviewed changes

gfyoung force-pushed the parse_dates_usecols branch 3 times, most recently from 6bba242 to f74f81f Compare April 6, 2016 00:48

jreback reviewed Apr 6, 2016
View reviewed changes

gfyoung force-pushed the parse_dates_usecols branch from f74f81f to f05d27a Compare April 6, 2016 02:15

jreback reviewed Apr 6, 2016
View reviewed changes

gfyoung force-pushed the parse_dates_usecols branch from f05d27a to 551c9f1 Compare April 6, 2016 13:35

jreback reviewed Apr 6, 2016
View reviewed changes

gfyoung force-pushed the parse_dates_usecols branch from 551c9f1 to 8e0489f Compare April 6, 2016 19:01

BUG: Prevent mixed-typed usecols

f0543a4

Enforces the fact that 'usecols' must either be all integers (indexing) or strings (column names), as mixtures of the two are ambiguous. Closes pandas-devgh-12678.

gfyoung force-pushed the parse_dates_usecols branch from 8e0489f to f0543a4 Compare April 6, 2016 19:10

jreback closed this in c6c201e Apr 6, 2016

gfyoung deleted the parse_dates_usecols branch April 6, 2016 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix parse_dates processing with usecols and C engine #12512

BUG: Fix parse_dates processing with usecols and C engine #12512

gfyoung commented Mar 2, 2016

jreback Mar 6, 2016

gfyoung Mar 6, 2016

gfyoung commented Mar 8, 2016

gfyoung commented Apr 5, 2016

jreback Apr 5, 2016

gfyoung Apr 6, 2016

jreback Apr 6, 2016

jreback Apr 6, 2016

gfyoung Apr 6, 2016

jreback Apr 6, 2016

gfyoung Apr 6, 2016

jreback Apr 6, 2016

gfyoung Apr 6, 2016

jreback commented Apr 6, 2016

gfyoung commented Apr 6, 2016




		- Bug in ``read_csv`` when specifying ``usecols`` and ``parse_dates`` simultaneously with the C engine (:issue:`9755`)

BUG: Fix parse_dates processing with usecols and C engine #12512

BUG: Fix parse_dates processing with usecols and C engine #12512

Conversation

gfyoung commented Mar 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Mar 8, 2016

gfyoung commented Apr 5, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 6, 2016

gfyoung commented Apr 6, 2016