Skip to content

BUG: DataFrame.to_html validates formatters has the correct length #28632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 7, 2019
Merged

BUG: DataFrame.to_html validates formatters has the correct length #28632

merged 2 commits into from
Oct 7, 2019

Conversation

guipleite
Copy link
Contributor

@guipleite guipleite commented Sep 26, 2019

@gabriellm1 @hugoecarl

@WillAyd WillAyd added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Sep 26, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good - general comments

@@ -561,7 +561,19 @@ def __init__(
self.sparsify = sparsify

self.float_format = float_format
self.formatters = formatters if formatters is not None else {}
if formatters is not None and (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use is_list_like from pandas._libs.lib here instead? I think should help simplify the logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it will simplify it because is_list_like returns True to both dictionaries and lists, however we only want it to enter in this statement if formatters is a list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified if you just precede this condition with:

if formatters is None:
    formatters = {}

And then go through the conditions. Right now there is a lot of duplication of conditions which makes it tougher to reason about

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an adjustment, is it better now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd

"You can use a protocol class with isinstance() ..." "isinstance() also works with the predefined protocols in typing such as Iterable." https://mypy.readthedocs.io/en/latest/protocols.html#using-isinstance-with-protocols

We could therefore use isinstance checks that match the types added to the function signatures.

in this case the logic could be as simple as..

        if isinstance(formatters, Sequence) and len(frame.columns) != len(formatters):
            msg = (
                "Formatters length({flen}) should match"
                " DataFrame number of columns({dlen})"
            ).format(flen=len(formatters), dlen=len(frame.columns))
            raise ValueError(msg)
        self.formatters = formatters if formatters is not None else {}

thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me and it works!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use this pattern in the codebase, so don't make these changes in this PR. it's a discussion point going forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, although mentioned previously, formatters type is currently Union[List[Callable], Tuple[Callable, ...], Dict[Union[str, int], Callable] since the code uses isinstance with list, tuple and dict for flow control.

going forward, formatters type could be as permissive as Union[Sequence[Callable], Mapping[Union[str, int], Callable]

and then the protocol based isinstance checks would be more applicable.

@@ -561,7 +561,19 @@ def __init__(
self.sparsify = sparsify

self.float_format = float_format
self.formatters = formatters if formatters is not None else {}
if formatters is not None and (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified if you just precede this condition with:

if formatters is None:
    formatters = {}

And then go through the conditions. Right now there is a lot of duplication of conditions which makes it tougher to reason about

"Formatters length({flen}) should match"
+ " DataFrame number of columns({dlen})"
).format(flen=len(formatters), dlen=len(frame.columns))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is quite complicated, can you not do

if formaters is not None and not do_len_comparision:
     raise....
self.formatters = formatters or {}

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better now?

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd WillAyd added this to the 1.0 milestone Oct 7, 2019
@WillAyd WillAyd merged commit 9db7f3e into pandas-dev:master Oct 7, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 7, 2019

Thanks @guipleite !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.to_html should validate that formatters has the correct length
6 participants