Skip to content

Type hint user-facing methods of ggplot (and watermark) #648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 23, 2022

Conversation

Emerentius
Copy link
Contributor

Add type hints to the public facing API of the central ggplot class (plus the watermark class). Some other functions are also type hinted to aid in the typing of the ggplot class.
I've added a submodule typing to add protocols and unions of general concepts that apply across the library. Numpy uses a submodule of the same name. Should I make this module pseudo-private (e.g. rename it to _typing) in the beginning?
Eventually, it should be public.

Mypy has released a new version 2 days ago and implicit optional types have been removed. I've addded explicit notations to fix the errors. Error codes are now shown by default so I've removed the setting from mypy.ini.

A note on import style, I see you are preferring relative imports for intra-package imports and I'm trying tried to conform to this, but there are cases where a relative import will result in a type error because mypy thinks I'm referring to the module and not the class. That's where I'm falling back to absolute imports. It happens in cases like this

from . import ggplot
foo: ggplot  # err about a module not being valid as type

@codecov
Copy link

codecov bot commented Nov 10, 2022

Codecov Report

Base: 0.00% // Head: 0.00% // No change to project coverage 👍

Coverage data is based on head (8a87e82) compared to base (1946278).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #648   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files        153     154    +1     
  Lines       9967   10017   +50     
=====================================
- Misses      9967   10017   +50     
Impacted Files Coverage Δ
plotnine/ggplot.py 0.00% <0.00%> (ø)
plotnine/layer.py 0.00% <0.00%> (ø)
plotnine/typing.py 0.00% <0.00%> (ø)
plotnine/utils.py 0.00% <0.00%> (ø)
plotnine/watermark.py 0.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

return single_arg, None
elif isinstance(single_arg, aes):
return None, single_arg
else:
Copy link
Contributor Author

@Emerentius Emerentius Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of the return values here was flipped from the 2-arg-case. As far as I can see, everything but the 2 argument case is dead code, so that's why it never came up.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The function is rather incoherent.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version wouldn't pass the typechecker, because it couldn't prove that it would always return something. And in fact, if there were two mappings or two data arguments, then the code would just run through all the checks, never fail and return None. I've rewritten it so that can't happen.

has2k1 added a commit that referenced this pull request Nov 10, 2022
Triggered by the requirements for type-checking, these two have the
same signature. And it just makes sense!

REF: python/mypy#6225
     #648
has2k1 added a commit that referenced this pull request Nov 17, 2022
Triggered by the requirements for type-checking, these two have the
same signature. And it just makes sense!

REF: python/mypy#6225
     #648
has2k1 added a commit that referenced this pull request Nov 17, 2022
Triggered by the requirements for type-checking.

"mypy doesn't allow the addition of extra arguments to a
 dunder method like __radd__", input from @Emerentius.

obj.__radd__ methods now always modify the ggplot object
that obj is added to. It is effectively summarized as:

   ggobj + obj   # obj.__radd__(deepcopy(ggobj))
   ggobj += obj  # obj.__radd__(ggobj)

REF: #648
has2k1 added a commit that referenced this pull request Nov 18, 2022
Returning different types of values based on the parameters
complicates the typing specification as you end up type stub
definitions e.g.

    @overload
    def draw(param: typing.Literal[False]) -> Type1
        ...

    @overload
    def draw(param: typing.Literal[True]) -> Type2
        ...

which match each input type to an output type.

Ref: #648
has2k1 added a commit that referenced this pull request Nov 18, 2022
has2k1 added a commit that referenced this pull request Nov 18, 2022
Triggered by #648,
properly typing require multiple @overload.
has2k1 added a commit that referenced this pull request Nov 18, 2022
Triggered by #648,
type checking it required @Overloads.
Copy link
Owner

@has2k1 has2k1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some refactoring to remedy to the highlights in comments. They have been very helpful as they allowed me to make changes for the better.

return single_arg, None
elif isinstance(single_arg, aes):
return None, single_arg
else:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The function is rather incoherent.

return single_arg, None
elif isinstance(single_arg, aes):
return None, single_arg
else:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten it.

# Returning a type guard here is not fully sound, because if `obj`
# is a callable, we aren't checking that it has no required args
# and we can't check the return value's type.
def is_data_like(obj: Any) -> TypeGuard[DataLike]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little head spinner. At the moment I have no idea how to clear it up nicely. We can inspect, but that is overkill. Maybe remove it later!

Copy link
Contributor Author

@Emerentius Emerentius Nov 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type information is available in the user's code when they construct any of the various classes that accept data, it just gets lost along the way because both data and aesthetic mappings are accepted as positional params in any order and are then rearranged to fit.

The code could be modified so the type information is never lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've narrowed the input type from Any to DataLike | aes | None. In that case, if the input is a Callable, then it must have the right signature or a type error would be thrown in the user's code, if they use a type checker.

Even with that, there is still the problem that the different allowed input types like callables and objects with to_pandas methods are not mutually exclusive. If an object has a to_pandas() method with the right signature but is also callable with the wrong signature, then the type checker would accept it. When conversion is then attempted by calling the object first, that would fail and a runtime error would be thrown. The inverse problem could also happen if to_pandas() conversion were attempted first in the library.

There are a couple options:

  1. Require the user to specify the conversion method
  2. Try the other conversion methods, if one fails
  3. Do nothing of the sort and just live with the corner case because objects that are both callable and have a to_pandas() method aren't common (no idea if this is true)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 3, "Do nothing" is enough. To my knowledge df.to_pandas() only applies to polars objects. Also, the future is probably polars with first class support then pandas frames will be converted, but that is still further down the line.

@Emerentius
Copy link
Contributor Author

Now that to_pandas() has been removed, the conversion will need to be done at some later point. There is only one place that does the conversion and that is layer._make_layer_data(). Is that enough?
I had to modify that function a bit to get it to typecheck for the case where layer.data is a Callable[[DataFrame], DataFrame] and the input plot_data hasn't been converted to a DataFrame yet.

@Emerentius Emerentius force-pushed the more_type_hints branch 2 times, most recently from 82236df to 7255791 Compare November 19, 2022 15:27
@has2k1
Copy link
Owner

has2k1 commented Nov 22, 2022

I am happy with this. I will look into isort, I think it now has enough options.

@has2k1 has2k1 merged commit c0ad93c into has2k1:main Nov 23, 2022
@has2k1
Copy link
Owner

has2k1 commented Nov 23, 2022

Thanks @Emerentius, I have learnt a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants