Regex C Engine Warning #10208

jseabold · 2015-05-26T14:12:06Z

Using pd.read_csv(..., sep=", ", ...) I'm now getting a warning about falling back to the C engine because regex parsing isn't supported in the C engine. That's fine, but this isn't actually using regex.

I don't have an idea for a good transition strategy, and maybe the ship has sailed, but perhaps there should be a separate read_regex or a regex keyword instead of emitting this warning for any string greater than length 1.

Pandas 0.16.0.

The text was updated successfully, but these errors were encountered:

jreback · 2015-05-26T14:38:53Z

The warning has to do with the fact that your separator is > 1 character, which is not supported by the parser (Prob isn't that hard to implement, but would need someone to do it). Here is a nice way to do this (and use the c-parser).

In [38]: data = """a, b, c\n1, 2, 3"""

In [39]: read_csv(StringIO(data),sep=",",engine='c',skipinitialspace=True)
Out[39]: 
   a  b  c
0  1  2  3

In [40]: read_csv(StringIO(data),sep=", ",engine='python')
Out[40]: 
   a  b  c
0  1  2  3

dukebody · 2016-03-29T09:11:19Z

From the documentation it is not clear when a separator is considered a regex and when it isn't. I was trying to use '::' as separator (MovieLens dataset) when reading a file and pandas was interpreting it as a regex, when it really isn't.

I think a separate sep_regex keyword would be cleaner. For the time being, we can also raise an exception "non-regex separators of more than 1 character are not supported". If it's the C engine that doesn't support >1 char separators, we can warn "C engine doesn't support separators longer than 1 character, falling back to Python engine".

TomAugspurger · 2016-03-29T12:57:47Z

I don't think there's any need to adjust the API, just a clearer warning message.

dukebody · 2016-03-29T13:20:30Z

I think documentation should also be amended.

sep: Delimiter to use. If sep is None, will try to automatically determine this. Regular expressions are accepted and will force use of the python parsing engine and will ignore quotes in the data.

When I first read this I wondered how pandas knows when am I using a regexp as delimiter and when am I using a normal string. I would change this by:

sep: Delimiter to use. If sep is None, will try to automatically determine this. If it is longer than 1 character, it will be interpreted as a regular expression, will force use of the python parsing engine and will ignore quotes in the data.

Anyhow I still believe that accepting string separators larger than 1 character is a good feature, but might need a separate ticket/issue.

jreback · 2016-03-29T13:26:11Z

IIRC if its > 1 length, then it by defintion defers to the python engine.

jreback · 2016-03-29T13:26:50Z

no need to add any more options to the parsers. But as @TomAugspurger points out a clearer error message would be fine.

jreback · 2016-03-29T13:27:17Z

@dukebody pull-requests welcome.

…ndas-dev#10208.

nhhas · 2019-09-21T17:21:43Z

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
"""Entry point for launching an IPython kernel.

I can't view the data at all. any helps?
thanx

unojoe2 · 2019-10-14T23:37:48Z

from book: Python for Data Analysis pg 23. There is deprecated
In [XX]: users = pd.read_table... TO pd.read_csv...
FILENAME: The referenced file name in the book is supposed to be changed...
from: yourFilePath/users.dat (is repeated accidentally in the book)
yourFilePath/ratings.dat
yourFilePath/movies.dat

And finally you should add (as stated above) engine='python'
users = pd.read_csv('yourFilePath/users.dat', engine='python'
...: , sep='::', header=None, names=unames)
rnames = pd.read_csv('yourFilePath/ratings.dat', engine='python'
...: , sep='::', header=None, names=unames)
mnames = pd.read_csv('yourFilePath/movies.dat', engine='python'
...: , sep='::', header=None, names=unames)

I hope that is helpful info, sorry if I totally missed the point, but I was stuck on this and typing in circles for longer than desired.

jreback added the IO CSV read_csv, to_csv label May 26, 2015

jreback added Docs Difficulty Novice Error Reporting Incorrect or improved errors from pandas labels Mar 29, 2016

jreback added this to the 0.18.1 milestone Mar 29, 2016

dukebody mentioned this issue Apr 3, 2016

DOC: Clarify when csv separator is being parsed as regex. Resolves #10208 #12781

Closed

4 tasks

dukebody added a commit to dukebody/pandas that referenced this issue Apr 3, 2016

DOC: Clarify when csv separator is being parsed as regex. Resolves pa…

c7858f6

…ndas-dev#10208.

jreback closed this as completed in 8776596 Apr 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regex C Engine Warning #10208

Regex C Engine Warning #10208

jseabold commented May 26, 2015

jreback commented May 26, 2015

dukebody commented Mar 29, 2016

TomAugspurger commented Mar 29, 2016

dukebody commented Mar 29, 2016

jreback commented Mar 29, 2016

jreback commented Mar 29, 2016

jreback commented Mar 29, 2016

nhhas commented Sep 21, 2019

unojoe2 commented Oct 14, 2019

Regex C Engine Warning #10208

Regex C Engine Warning #10208

Comments

jseabold commented May 26, 2015

jreback commented May 26, 2015

dukebody commented Mar 29, 2016

TomAugspurger commented Mar 29, 2016

dukebody commented Mar 29, 2016

jreback commented Mar 29, 2016

jreback commented Mar 29, 2016

jreback commented Mar 29, 2016

nhhas commented Sep 21, 2019

unojoe2 commented Oct 14, 2019