-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Regex C Engine Warning #10208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The warning has to do with the fact that your separator is > 1 character, which is not supported by the parser (Prob isn't that hard to implement, but would need someone to do it). Here is a nice way to do this (and use the c-parser).
|
From the documentation it is not clear when a separator is considered a regex and when it isn't. I was trying to use '::' as separator (MovieLens dataset) when reading a file and pandas was interpreting it as a regex, when it really isn't. I think a separate |
I don't think there's any need to adjust the API, just a clearer warning message. |
I think documentation should also be amended.
When I first read this I wondered how pandas knows when am I using a regexp as delimiter and when am I using a normal string. I would change this by:
Anyhow I still believe that accepting string separators larger than 1 character is a good feature, but might need a separate ticket/issue. |
IIRC if its > 1 length, then it by defintion defers to the python engine. |
no need to add any more options to the parsers. But as @TomAugspurger points out a clearer error message would be fine. |
@dukebody pull-requests welcome. |
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. I can't view the data at all. any helps? |
from book: Python for Data Analysis pg 23. There is deprecated And finally you should add (as stated above) engine='python' I hope that is helpful info, sorry if I totally missed the point, but I was stuck on this and typing in circles for longer than desired. |
Using
pd.read_csv(..., sep=", ", ...)
I'm now getting a warning about falling back to the C engine because regex parsing isn't supported in the C engine. That's fine, but this isn't actually using regex.I don't have an idea for a good transition strategy, and maybe the ship has sailed, but perhaps there should be a separate
read_regex
or aregex
keyword instead of emitting this warning for any string greater than length 1.Pandas 0.16.0.
The text was updated successfully, but these errors were encountered: