-
Notifications
You must be signed in to change notification settings - Fork 233
Type hint user-facing methods of ggplot (and watermark) #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportBase: 0.00% // Head: 0.00% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #648 +/- ##
=====================================
Coverage 0.00% 0.00%
=====================================
Files 153 154 +1
Lines 9967 10017 +50
=====================================
- Misses 9967 10017 +50
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
plotnine/utils.py
Outdated
return single_arg, None | ||
elif isinstance(single_arg, aes): | ||
return None, single_arg | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of the return values here was flipped from the 2-arg-case. As far as I can see, everything but the 2 argument case is dead code, so that's why it never came up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The function is rather incoherent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rewritten it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version wouldn't pass the typechecker, because it couldn't prove that it would always return something. And in fact, if there were two mappings or two data arguments, then the code would just run through all the checks, never fail and return None. I've rewritten it so that can't happen.
Triggered by the requirements for type-checking, these two have the same signature. And it just makes sense! REF: python/mypy#6225 #648
Triggered by the requirements for type-checking, these two have the same signature. And it just makes sense! REF: python/mypy#6225 #648
Triggered by the requirements for type-checking. "mypy doesn't allow the addition of extra arguments to a dunder method like __radd__", input from @Emerentius. obj.__radd__ methods now always modify the ggplot object that obj is added to. It is effectively summarized as: ggobj + obj # obj.__radd__(deepcopy(ggobj)) ggobj += obj # obj.__radd__(ggobj) REF: #648
Returning different types of values based on the parameters complicates the typing specification as you end up type stub definitions e.g. @overload def draw(param: typing.Literal[False]) -> Type1 ... @overload def draw(param: typing.Literal[True]) -> Type2 ... which match each input type to an output type. Ref: #648
Triggered by #648, type checking it required @Overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done some refactoring to remedy to the highlights in comments. They have been very helpful as they allowed me to make changes for the better.
plotnine/utils.py
Outdated
return single_arg, None | ||
elif isinstance(single_arg, aes): | ||
return None, single_arg | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The function is rather incoherent.
plotnine/utils.py
Outdated
return single_arg, None | ||
elif isinstance(single_arg, aes): | ||
return None, single_arg | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rewritten it.
# Returning a type guard here is not fully sound, because if `obj` | ||
# is a callable, we aren't checking that it has no required args | ||
# and we can't check the return value's type. | ||
def is_data_like(obj: Any) -> TypeGuard[DataLike]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little head spinner. At the moment I have no idea how to clear it up nicely. We can inspect, but that is overkill. Maybe remove it later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type information is available in the user's code when they construct any of the various classes that accept data, it just gets lost along the way because both data and aesthetic mappings are accepted as positional params in any order and are then rearranged to fit.
The code could be modified so the type information is never lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've narrowed the input type from Any
to DataLike | aes | None
. In that case, if the input is a Callable, then it must have the right signature or a type error would be thrown in the user's code, if they use a type checker.
Even with that, there is still the problem that the different allowed input types like callables and objects with to_pandas
methods are not mutually exclusive. If an object has a to_pandas()
method with the right signature but is also callable with the wrong signature, then the type checker would accept it. When conversion is then attempted by calling the object first, that would fail and a runtime error would be thrown. The inverse problem could also happen if to_pandas()
conversion were attempted first in the library.
There are a couple options:
- Require the user to specify the conversion method
- Try the other conversion methods, if one fails
- Do nothing of the sort and just live with the corner case because objects that are both callable and have a to_pandas() method aren't common (no idea if this is true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 3, "Do nothing" is enough. To my knowledge df.to_pandas()
only applies to polars objects. Also, the future is probably polars with first class support then pandas frames will be converted, but that is still further down the line.
Now that |
82236df
to
7255791
Compare
7255791
to
8a87e82
Compare
I am happy with this. I will look into |
Thanks @Emerentius, I have learnt a lot. |
Add type hints to the public facing API of the central ggplot class (plus the watermark class). Some other functions are also type hinted to aid in the typing of the ggplot class.
I've added a submodule
typing
to add protocols and unions of general concepts that apply across the library. Numpy uses a submodule of the same name. Should I make this module pseudo-private (e.g. rename it to_typing
) in the beginning?Eventually, it should be public.
Mypy has released a new version 2 days ago and implicit optional types have been removed. I've addded explicit notations to fix the errors. Error codes are now shown by default so I've removed the setting from mypy.ini.
A note on import style, I see you are preferring relative imports for intra-package imports and I'm trying tried to conform to this, but there are cases where a relative import will result in a type error because mypy thinks I'm referring to the module and not the class. That's where I'm falling back to absolute imports. It happens in cases like this