Skip to content

read_html: support "decimal" argument for parsing numbers, like read_csv #12907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
educhana opened this issue Apr 16, 2016 · 3 comments
Closed
Labels
Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap
Milestone

Comments

@educhana
Copy link

read_csv has support for declaring the decimal and thousands separator.

read_html is missing the 'decimal' parameter. it'd be useful and more consistent to accept it too.

Example:

pd.read_html(html, thousands='.', decimal=',')
@jreback
Copy link
Contributor

jreback commented Apr 17, 2016

this is a dupe of #8200 but we'll close that issue. This should be straightforward and just need to pass this thru to the parser. Pull-requests are welcome.

@jreback jreback added Enhancement Difficulty Novice IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Apr 17, 2016
@jreback jreback added this to the Next Major Release milestone Apr 17, 2016
@sldion
Copy link

sldion commented Apr 20, 2016

I don't think that TextParser supports passing the decimal keyword in python. The functionality probably needs to be added to PythonParser

@jreback
Copy link
Contributor

jreback commented Apr 20, 2016

This defaults to the c-engine, so I don't really see a problem. If it actually is, then you could defer to the python engine if the decimal kw is not-None.

#12933 to add the decimal option (I don't know if we have that particular issue).

Alternatively, you could post-process.

In [7]: s = Series(['1,2','2,3'])

In [8]: s
Out[8]: 
0    1,2
1    2,3
dtype: object

In [9]: pd.to_numeric(s.str.replace(',','.'),errors='coerce')
Out[9]: 
0    1.2
1    2.3
dtype: float64

@jreback jreback modified the milestones: 0.18.2, Next Major Release May 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants