-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: DataFrame.to_html validates formatters has the correct length #28632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good - general comments
pandas/io/formats/format.py
Outdated
@@ -561,7 +561,19 @@ def __init__( | |||
self.sparsify = sparsify | |||
|
|||
self.float_format = float_format | |||
self.formatters = formatters if formatters is not None else {} | |||
if formatters is not None and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use is_list_like
from pandas._libs.lib
here instead? I think should help simplify the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it will simplify it because is_list_like
returns True
to both dictionaries and lists, however we only want it to enter in this statement if formatters
is a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified if you just precede this condition with:
if formatters is None:
formatters = {}
And then go through the conditions. Right now there is a lot of duplication of conditions which makes it tougher to reason about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made an adjustment, is it better now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"You can use a protocol class with isinstance() ..." "isinstance() also works with the predefined protocols in typing such as Iterable." https://mypy.readthedocs.io/en/latest/protocols.html#using-isinstance-with-protocols
We could therefore use isinstance checks that match the types added to the function signatures.
in this case the logic could be as simple as..
if isinstance(formatters, Sequence) and len(frame.columns) != len(formatters):
msg = (
"Formatters length({flen}) should match"
" DataFrame number of columns({dlen})"
).format(flen=len(formatters), dlen=len(frame.columns))
raise ValueError(msg)
self.formatters = formatters if formatters is not None else {}
thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for me and it works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use this pattern in the codebase, so don't make these changes in this PR. it's a discussion point going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, although mentioned previously, formatters type is currently Union[List[Callable], Tuple[Callable, ...], Dict[Union[str, int], Callable]
since the code uses isinstance with list, tuple and dict for flow control.
going forward, formatters type could be as permissive as Union[Sequence[Callable], Mapping[Union[str, int], Callable]
and then the protocol based isinstance checks would be more applicable.
pandas/io/formats/format.py
Outdated
@@ -561,7 +561,19 @@ def __init__( | |||
self.sparsify = sparsify | |||
|
|||
self.float_format = float_format | |||
self.formatters = formatters if formatters is not None else {} | |||
if formatters is not None and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified if you just precede this condition with:
if formatters is None:
formatters = {}
And then go through the conditions. Right now there is a lot of duplication of conditions which makes it tougher to reason about
"Formatters length({flen}) should match" | ||
+ " DataFrame number of columns({dlen})" | ||
).format(flen=len(formatters), dlen=len(frame.columns)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic is quite complicated, can you not do
if formaters is not None and not do_len_comparision:
raise....
self.formatters = formatters or {}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better now?
Co-Authored-By: Simon Hawkins <[email protected]> general correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @guipleite ! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
@gabriellm1 @hugoecarl