-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: read_csv does not read double double quotes in pipe delimited txt file #41819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Did you try csv.QUOTE_NONE?
Please provide reproducible examples in the future, which do not rely on external variables or files (see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) |
oh, then it messes it up entirely. |
Please post something reproducible and your expected output. Also please narrow your example down to the minimum necessary to reproduce |
This is exactly what I want. pandas cannot figure out the double quotes when it comes in between in the item. |
I don't think this is intended to work. It is either a quote char or not, but not both. |
Thanks for the report, but it appears that the behavior is expected. Closing. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
[] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
This is a line in a file named as
data.txt
and I am trying to read this in pandas usingread_csv
."xxx"|"xxx"|"-xxxxxxx"|"xxxxx"|"x"|"xx"|""xxxxxx""|"x"|"xx"|"xxxxxxx"|""|"x"|"xxxxxx"|"X"|"xxxx"|"xxxxx"|""
Problem description
The problem I am facing here is that even though all the other data are read in the data frame correctly, pandas has an issue when it comes to reading ""xxxxxx"" two double quotes and it reads it as xxxxxx"" inside the dataframe
As you can notice there is in the 7th index in the above line, there is an item with double-double quotes, that is the issue
Expected Output
The expected output should be that it should be read as "xxxxxx" inside the data frame
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-1043-gcp
Version : #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0
The text was updated successfully, but these errors were encountered: