-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_csv dtype argument not working when there is a footer #5232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
is present in master, thanks for the report work-around is to coerce after
|
dtype is only supported in c parser, but To fix this bug, we have to either implement dtype in python parser or setfooter in c parser (or both!) However, there's another problem here. If you explicitly set engine to python, you'll get an error:
However, that check happens before we implicitly switch parser to python. I think it would be best to (a) Issue a warning when we switch converters automatically, and (b) move the engine switch before we validate options against the eventual engine. Probably should be a different ticket though. |
As I am regularly parsing Gigabytes of text files, I definitely support the idea of having a warning when I'm switched to a slower parsing engine. |
Just realized that the dtype argument does not work for me in master at all using the ' dtypes_dic
{'af': 'int',
'c': 'int',
'date': 'int',
'det': 'int',
'hour': 'int',
'minute': 'int',
'month': 'int',
'orbit': 'int',
'year': 'int'}
df = pd.read_csv(fname, delim_whitespace=True, sep='\s*',
dtype=dtypes_dic, engine='c')
df.dtypes
date float64
month float64
year float64
...
qual float64
sppsx float64
sppsy float64
Length: 40, dtype: object |
FYI, |
@michaelaye |
import StringIO
s = """\ta\tb\tc\td
\t1.0\t4.2\t2\t6
\t6.0\t2.1\t3\t6
"""
s_in = StringIO.StringIO(s)
s_in.seek(0)
df = pd.read_csv(s_in, sep='\s*', dtype={'a':np.int32})
df.dtypes a float64
b float64
c int64
d int64
dtype: object I was using Maybe a down coercion is not allowed due to potential data loss? |
@guyrt want to take this? (I guess implement dtype in python parser and footer?) |
skip_footer in the c-parser would be nice! |
|
closed as already marked in other issues specifically as @gfyoung indicates above |
xref #7141
Datafile test.csv
col1|col2
a|438087272980
b|399432587827
c|592706116147
d|1584843561523
footer 1
Command
print pd.read_csv('test.csv', sep='|', skipfooter=1, dtype={'col2':'object'}).dtypes
Output
col1 object
col2 int64
dtype: object
Expected Output
col1 object
col2 object
dtype: object
Platform
Windows XP, Python 2.7, Pandas version 0.11.0
The text was updated successfully, but these errors were encountered: