Skip to content

BUG: 3.4 with current versions of bs4/lxml/html5lib minor parsing errors #7229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue May 24, 2014 · 5 comments
Closed
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented May 24, 2014

======================================================================
ERROR: test_thousands_macau_index_col (pandas.io.tests.test_html.TestReadHtml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 410, in test_thousands_macau_index_col
    dfs = self.read_html(macau_data, index_col=0, header=0)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 96, in read_html
    return read_html(*args, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 840, in read_html
    parse_dates, tupleize_cols, thousands, attrs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 709, in _parse
    raise_with_traceback(retained)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\compat\__init__.py", line 705, in raise_with_traceback
    raise exc.with_traceback(traceback)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6275: character maps to <undefined>

======================================================================
ERROR: test_thousands_macau_stats (pandas.io.tests.test_html.TestReadHtml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 401, in test_thousands_macau_stats
    attrs={'class': 'style1'})
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 96, in read_html
    return read_html(*args, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 840, in read_html
    parse_dates, tupleize_cols, thousands, attrs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 709, in _parse
    raise_with_traceback(retained)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\compat\__init__.py", line 705, in raise_with_traceback
    raise exc.with_traceback(traceback)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6275: character maps to <undefined>

----------------------------------------------------------------------
Ran 7205 tests in 459.919s
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: None
nose: 1.3.0
Cython: 0.20.1
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: None
sphinx: None
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3
matplotlib: 1.3.1
openpyxl: 2.0.3
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.2
pymysql: None
psycopg2: None
@jreback jreback added this to the 0.14.1 milestone May 24, 2014
@jreback
Copy link
Contributor Author

jreback commented May 24, 2014

@cpcloud I will have to add these versions to the 3.4 on travis as well....

@cpcloud
Copy link
Member

cpcloud commented May 24, 2014

cool i'm having some trouble getting the vagrant box setup i'll try again this week

@cpcloud cpcloud self-assigned this Jun 1, 2014
@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2014

@jreback is this only on windows?

@cpcloud
Copy link
Member

cpcloud commented Jun 7, 2014

i can't repro on linux with 3.4 (tho i get another error which i'm going to submit a pr for)

@jreback
Copy link
Contributor Author

jreback commented Jun 21, 2014

this isn't occurring anymore....so closing...will reopen if its does (maybe the encoding fix fixed this?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

No branches or pull requests

2 participants