-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TYPING: type hints for core.indexing #27527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYPING: type hints for core.indexing #27527
Conversation
pandas/core/indexing.py
Outdated
""" convert a range argument """ | ||
return list(key) | ||
|
||
def _convert_scalar_indexer(self, key, axis: int): | ||
def _convert_scalar_indexer(self, key: Any, axis: int) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't restricted to scalar key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monkeytype --disable-type-rewriting
gives
def _convert_scalar_indexer(
self,
key: Optional[Union[float64, List[str], str, ndarray, bytes, RangeIndex, List[Tuple[str, str]], Tuple[slice, slice], Dict[Tuple[str, str], int], Timedelta, bool, Tuple[str, int], Float64Index, int, Tuple[slice, slice, str], TimedeltaIndex, Tuple[int64, int64, int64, int64], Dict[str, int], Int64Index, int32, List[int], Index, float, List[Any], time, Tuple[int, int], Series, Set[str], int64, Interval, Tuple[slice, slice, List[str]], Tuple[slice, int], DatetimeIndex, SparseArray, Tuple[slice, List[int]], Timestamp, List[Timestamp], datetime64, Set[Tuple[str, str]], MultiIndex, datetime, Tuple[str, str], Tuple[slice, List[str]], NaTType, List[bool]]],
axis: int
) -> Optional[Union[float64, List[str], str, ndarray, bytes, RangeIndex, List[Tuple[str, str]], Tuple[slice, slice], Dict[Tuple[str, str], int], Timedelta, bool, Tuple[str, int], Float64Index, int, Tuple[slice, slice, str], TimedeltaIndex, Tuple[int64, int64, int64, int64], Dict[str, int], Int64Index, int32, List[int], Index, float, List[Any], time, Tuple[int, int], Series, Set[str], int64, Interval, Tuple[slice, slice, List[str]], Tuple[slice, int], DatetimeIndex, SparseArray, Tuple[slice, List[int]], Timestamp, List[Timestamp], datetime64, Set[Tuple[str, str]], MultiIndex, datetime, Tuple[str, str], Tuple[slice, List[str]], NaTType, List[bool]]]: ...
I could be wrong, but I assume that test functions that raise are not included in the traces. There wouldn't much point otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So no way to restrict this further? Any
just doesn't really provide the reader/developer with any more insight into how the function is supposed to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So no way to restrict this further?
probably. but not time well spent until more call sites and more imports are typed.
pandas/core/indexing.py
Outdated
import warnings | ||
|
||
import numpy as np | ||
from numpy import ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think outside of cython we generally avoid this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will change.
pandas/core/indexing.py
Outdated
@@ -88,7 +94,7 @@ class _IndexSlice: | |||
B1 10 11 | |||
""" | |||
|
|||
def __getitem__(self, arg): | |||
def __getitem__(self, arg: Any) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these at least not have to be Hashable and/or some type of Mapping / Sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
Not prepared to justify each instance of Any. so will just remove them to streamline the process since that seems to be your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn’t it possible just reveal type these then keep adding to a Union?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably not so many calls to getitem from within the codebase as from the tests. but, yes, that seems a reasonable approach.
pandas/core/indexing.py
Outdated
""" convert a range argument """ | ||
return list(key) | ||
|
||
def _convert_scalar_indexer(self, key, axis: int): | ||
def _convert_scalar_indexer(self, key: Any, axis: int) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So no way to restrict this further? Any
just doesn't really provide the reader/developer with any more insight into how the function is supposed to work.
i wonder if there is a decorator that we could use in tests to turn on reveal_type and show a summary |
pandas/core/indexing.py
Outdated
key = tuple([key] + [slice(None)] * (len(labels.levels) - 1)) | ||
list_items = [key] # type: List[Union[slice,str]] | ||
list_items += [slice(None)] * (len(labels.levels) - 1) | ||
key = tuple(list_items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no happy changing code. the original is perfectly valid. will probably revert this and add a type: ignore and wait for a mypy update.
@WillAyd should we add --warn-unused-ignores
to either the CI, or warn-unused-ignores=True
to the ini file.
or happy to just run ad-hoc periodically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not reuse key
and just call it something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue is that you can't add a list of slices to a list of strings in an inline expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok that's strange. Let me open an issue on Mypy tracker and we can ref that in ignore comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think the slice is a red herring. This fails with just int and str as well:
values: Union[List[int], List[str]] = [1] + [""]
# error: List item 0 has incompatible type "str"; expected "int"
values: List[Union[int, str]] = [1] + [""]
# error: Incompatible types in assignment (expression has type "List[int]", variable has type "List[Union[int, str]]")
# note: "List" is invariant -- see http://mypy.readthedocs.io/en/latest/common_issues.html#variance
# note: Consider using "Sequence" instead, which is covariant
# error: List item 0 has incompatible type "str"; expected "int"
So I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see python/mypy#5492
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea good find. Can just add that as comment and ignore (suggested approach from mypy for dealing with these types of things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good one question
@@ -860,7 +867,7 @@ def _handle_lowerdim_multi_index_axis0(self, tup: Tuple): | |||
axis = self.axis or 0 | |||
try: | |||
# fast path for series or for tup devoid of slices | |||
return self._get_label(tup, axis=axis) | |||
return self._get_label(tup, axis=axis) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error was this throwing? And one below. Sorry if missed in diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis needs to be int but is type Axis.
probably not yet got the return type for self.obj._get_axis_number(axis)
L112. This is defined in another module so would rather keep additions to single modules for now.
could add a cast, but that lets you lie to the type checker so can be more dangerous than ignoring for now.
not sure whether mypy will figure this out once more type hints are added.
as long as we keeping using --warn-unused-ignores
we should know if these become redundant and can be removed. otherwise will probably need to add cast in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to just type that return then? Understood in a different module but reason I ask is I think there is buggy axis behavior when using strings that mypy may detect for us if we don't ignore.
To illustrate, these go down different code paths though result is equivalent
In [5]: df = pd.DataFrame(np.ones((2,2)), index=pd.MultiIndex.from_tuples((('a', 'b'), ('c', 'd'))))
In [5]: df.loc[("a", "b")]
Out[8]:
0 1.0
1 1.0
Name: (a, b), dtype: float64
In [9]: df.loc(axis="index")[("a", "b")]
Out[9]:
0 1.0
1 1.0
Name: (a, b), dtype: float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to just type that return then? Understood in a different module but reason I ask is I think there is buggy axis behavior when using strings that mypy may detect for us if we don't ignore.
I accept that we probably have different motivations here.
my motivation is to clean up the io.formats code. there is a lot of code duplication there, and i believe buggy code there as well.
so my view is that to support the refactoring activity, it's better to get, say 80%, of each of the imports to formats typed, than to get 100% of just a few.
so i'd prefer "adding # type: ignore comments to silence errors you don’t want to fix now." https://mypy.readthedocs.io/en/latest/existing_code.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK sure. I'm just hesitant that ignore
and Any
mean we won't ever review in earnest, but sounds like you have a plan here so that doesn't apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just hesitant that
ignore
andAny
mean we won't ever review in earnest
I'm sure that in time we will want to increase the strictness of the mypy checks. so hopefully this won't apply.
also, i'm using MonkeyType to annotate with the actual types passed and returned, rather than what we think they should be. This itself is quickly revealing refactoring/cleanup opportunities.
I guess we could categorize bugs into three categories for the purpose of adding type hints.
-
bugs introduced changing code just to appease the type checker. hence i would prefer ignore to cast, and cast to code changes.
-
bugs in the codebase. adding type hints is going to help enormously.
-
bugs that will be added in the future. Again here, i have the opinion that getting, say 80%, of the codebase done quickly is more likely to be a benefit than perfection on the parts that are done.
Can you merge master? Just pushed in change to remove ABCs from _typing so should run CI again to double check |
sure |
I am with @WillAyd here about the type ignore's. I am afraid that we are never going to revisit these; we did this for a while with flake8 and it was a disaster; so I would either: actually define the type or type somewhat less until this can be fixed. |
fair enough. but this defeats the purpose of why I was adding type hints... to make the types of the function parameters and return types known to other functions. |
cc @WillAyd
Any is added in a separate commit in case we can't come to an agreement ;)