Skip to content

BUG: read_csv does not read double double quotes in pipe delimited txt file #41819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
architsingh15 opened this issue Jun 4, 2021 · 6 comments
Closed
2 tasks done
Labels
IO CSV read_csv, to_csv Usage Question

Comments

@architsingh15
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • [] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

This is a line in a file named as data.txt and I am trying to read this in pandas using read_csv.
"xxx"|"xxx"|"-xxxxxxx"|"xxxxx"|"x"|"xx"|""xxxxxx""|"x"|"xx"|"xxxxxxx"|""|"x"|"xxxxxx"|"X"|"xxxx"|"xxxxx"|""

df = pd.read_csv('data.txt', names=columns, dtype=column_dict, na_values=[''], keep_default_na=False, sep='|', encoding='cp1252', skiprows=1)

Problem description

The problem I am facing here is that even though all the other data are read in the data frame correctly, pandas has an issue when it comes to reading ""xxxxxx"" two double quotes and it reads it as xxxxxx"" inside the dataframe
As you can notice there is in the 7th index in the above line, there is an item with double-double quotes, that is the issue

Expected Output

The expected output should be that it should be read as "xxxxxx" inside the data frame

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-1043-gcp
Version : #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0

@architsingh15 architsingh15 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2021
@phofl
Copy link
Member

phofl commented Jun 5, 2021

Did you try csv.QUOTE_NONE?

df = pd.read_csv(StringIO(data), na_values=[''], keep_default_na=False, sep='|', encoding='cp1252', skiprows=1, quoting=csv.QUOTE_NONE)

Please provide reproducible examples in the future, which do not rely on external variables or files (see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)

@phofl phofl added IO CSV read_csv, to_csv Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2021
@architsingh15
Copy link
Author

oh, then it messes it up entirely.
"xxx"|"xxx"|"-xxxxxxx"|"xxxxx"|"x"|"xx"|""xxxxxx""|"x"|"xx"|"xxxxxxx"|""|"x"|"xxxxxx"|"X"|"xxxx"|"xxxxx"|""
If suppose this had to be a row in the df, pandas puts first three elements in the first column itself.
"xxx" "xxx" "-xxxxxxx"
like this

@phofl
Copy link
Member

phofl commented Jun 5, 2021

Please post something reproducible and your expected output. Also please narrow your example down to the minimum necessary to reproduce

@phofl phofl added the Needs Info Clarification about behavior needed to assess issue label Jun 5, 2021
@architsingh15
Copy link
Author

This is exactly what I want. pandas cannot figure out the double quotes when it comes in between in the item.
https://stackoverflow.com/questions/58325337/reading-csv-file-in-pandas-with-double-double-quotes-and-embedded-commas

@phofl
Copy link
Member

phofl commented Jun 5, 2021

I don't think this is intended to work. It is either a quote char or not, but not both.

@phofl phofl removed the Needs Info Clarification about behavior needed to assess issue label Jun 5, 2021
@mroeschke
Copy link
Member

Thanks for the report, but it appears that the behavior is expected. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants