Skip to content

pandas.DataFrame.to_html() without table border and tr style #22692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
norok2 opened this issue Sep 13, 2018 · 7 comments · Fixed by #45943
Closed

pandas.DataFrame.to_html() without table border and tr style #22692

norok2 opened this issue Sep 13, 2018 · 7 comments · Fixed by #45943
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Milestone

Comments

@norok2
Copy link

norok2 commented Sep 13, 2018

The following code produces HTML code for the corresponding table:

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.arange(3 * 4).reshape(3, 4))
df.to_html(classes=None, border=None, justify=None)

Specifically, you get:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
    ...

However, both border and style tags are actually visualization details that should be left for the CSS. The expected behavior should be that the produced HTML should be:

<table>
  <thead>
    <tr>
    ...

Besides the problem that there is no way of getting to a clean HTML, this is contradicting the Zen of Python n.2 https://www.python.org/dev/peps/pep-0020/ , where Explicit is better than implicit..

If the default output is something of values for the majority of pandas users, the default values for classes, border and justify should be: classes='dataframe' (or classes=('dataframe',)), border=1, justify='right'.

(Just tested on the master branch).

@TomAugspurger
Copy link
Contributor

I thought we had issues for these, but couldn't find any on a quick search.

Broadly agreed with you, just need to make the changes backwards compatible.

@TomAugspurger TomAugspurger added Output-Formatting __repr__ of pandas objects, to_string IO HTML read_html, to_html, Styler.apply, Styler.applymap Difficulty Intermediate labels Sep 13, 2018
@norok2
Copy link
Author

norok2 commented Sep 13, 2018

Perhaps, you saw this: https://stackoverflow.com/questions/51460112/pandas-dataframe-to-html-without-table-border-and-tr-style

Anyway, I could probably work out on a patch. Should not be terribly difficult. Just let me know what is supposed to be there. I see that right now, the default behavior of pandas for quite a few things is to have the value to default to None and then guess some sensible behavior for this.

So, what's your policy for making this a backward compatible change? I can read some docs, if you have a link lying around.

@TomAugspurger
Copy link
Contributor

Contributing docs are at http://pandas-docs.github.io/pandas-docs-travis/contributing.html

For the auto-dataframe class, we'll need a new keyword like dataframe_class=True by default, which can be set to False to not include dataframe in the classes list (and maybe we'll want False to be the future default)

For the text-align issue, not sure... Maybe tr_style="text-align: right;" as the default? And if that's falsey, then we don't include a style? Not sure how hard that would be to implement.

@norok2
Copy link
Author

norok2 commented Sep 13, 2018

I don't think a new keyword is really needed, as long as the old one defaults to something transparent. And then the user may decided what to have with it. As far as the text-align issue, I would see some value in having a tr_style=... keyword.

The problem is definitely how to make sure not to break current code. In other projects I saw that issuing a warning in-between version was their way to go, so perhaps we could have a two-step where at first explicit None would just issue a warning, and the default values are set to mimic the current behavior, and then for the next version, None would do the right thing, and the default changes to whatever you think is the right default for this.

If to_html is in standard Python (e.g. not Cython / ufunc), these changes should be trivial.

@TomAugspurger
Copy link
Contributor

I don't think a new keyword is really needed, as long as the old one defaults to something transparent.

By old one, you mean classes? I'm just not sure how we would preserve classes=['my_class'] returning the current default of classes=['dataframe', 'my_class'] without a second keyword. I may be missing something though.

The problem is definitely how to make sure not to break current code.

to be clear: there are two discussions to be had.

  1. How to produce a clean HTML table without all the junk pandas adds :)
  2. Changing the defaults to be clean

For now, I'd be happy to see 1 fixed. I'm less sure about changing the defaults.

@norok2
Copy link
Author

norok2 commented Sep 14, 2018

By old one, you mean classes? I'm just not sure how we would preserve classes=['my_class'] returning the current default of classes=['dataframe', 'my_class'] without a second keyword. I may be missing something though.

I am realizing that None is used very often as a proxy for internal get_option(), and this somehow interferes with

2. Changing the defaults to be clean

Then perhaps a good path towards

1. How to produce a clean HTML table without all the junk pandas adds :)

could be:

df.to_html(classes='', border='', justify='unset')

There are a few BUT though:

  • the keyword classes is currently not managed by a get_option() call and the dataframe class is hardcoded (should this be a separate PR?)
  • the solution is not too consistent, but apparently unset is already accepted by pandas AND it appears to be unused by CSS's text-align, e.g. https://jigsaw.w3.org/css-validator/validator would silently convert it to initial. Adding a new accepted value for justify would require some control over the other conversions that use DataFrameFormatter() class.

Also, I am not quite sure how to document this behavior, but the code and the tests should be more or less ready. I can prepare a PR anytime.

@TomAugspurger
Copy link
Contributor

I would favor df.to_html(classes=False, border=False) rather than empty strings, as in many places pandas allows keyword=string as a shorthand for keyword=[string] when a list of strings is really expected. But I like your general idea.

classes is currently not managed by a get_option() call and the dataframe class is hardcoded (should this be a separate PR?)

Yes probably.

I'm not really familiar with the justification code.

@mroeschke mroeschke added the Bug label May 7, 2020
@mroeschke mroeschke removed the Output-Formatting __repr__ of pandas objects, to_string label Jun 22, 2021
@jreback jreback added this to the 1.5 milestone Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants