Skip to content

Issues reading ragged CSV files #2981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wesm opened this issue Mar 6, 2013 · 4 comments
Closed

Issues reading ragged CSV files #2981

wesm opened this issue Mar 6, 2013 · 4 comments
Labels
Bug IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@wesm
Copy link
Member

wesm commented Mar 6, 2013

http://stackoverflow.com/questions/15242746/handling-variable-number-of-columns-with-pandas-python

@ghost ghost closed this as completed Mar 14, 2013
@ghost ghost reopened this Mar 14, 2013
@dsm054
Copy link
Contributor

dsm054 commented Mar 29, 2013

I just came across another case where because the input data is very ragged, it's surprisingly difficult even to get the data into a DataFrame for processing. Entirely independent of performance, it'd be nice to have a canonical way which Just Worked(tm) for this.

@wesm wesm closed this as completed in 900a552 Mar 29, 2013
@wesm
Copy link
Member Author

wesm commented Mar 29, 2013

And we're in business ("just works" now):

In [5]: paste
data = """1,2,3
1,2,3,4
1,2,3,4,5
1,2
1,2,3,4"""

## -- End pasted text --

In [6]: pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd', 'e'])
Out[6]: 
   a  b   c   d   e
0  1  2   3 NaN NaN
1  1  2   3   4 NaN
2  1  2   3   4   5
3  1  2 NaN NaN NaN
4  1  2   3   4 NaN

@dsm054
Copy link
Contributor

dsm054 commented Apr 1, 2013

I think the fix may have unintended consequences:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> 
>>> data = """
... 1,2
... 3,4,5
... """.strip()
>>> 
>>> pd.read_csv(StringIO(data), header=None, names=range(3))
   0  1   2
0  1  2 NaN
1  3  4   5
>>> pd.read_csv(StringIO(data), header=None, names=range(20))
*** glibc detected *** python: realloc(): invalid next size: 0x0a58f158 ***

Not sure if it's worth opening a separate issue or not.

@ghost ghost reopened this Apr 1, 2013
@wesm
Copy link
Member Author

wesm commented Apr 1, 2013

Thanks. I will have a look at the C code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

2 participants