Skip to content

TYPING: type hints for core.indexing #27527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

simonjayhawkins
Copy link
Member

cc @WillAyd

Any is added in a separate commit in case we can't come to an agreement ;)

@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Typing type annotations, mypy/pyright type checking labels Jul 22, 2019
""" convert a range argument """
return list(key)

def _convert_scalar_indexer(self, key, axis: int):
def _convert_scalar_indexer(self, key: Any, axis: int) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't restricted to scalar key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monkeytype --disable-type-rewriting gives

    def _convert_scalar_indexer(
        self,
        key: Optional[Union[float64, List[str], str, ndarray, bytes, RangeIndex, List[Tuple[str, str]], Tuple[slice, slice], Dict[Tuple[str, str], int], Timedelta, bool, Tuple[str, int], Float64Index, int, Tuple[slice, slice, str], TimedeltaIndex, Tuple[int64, int64, int64, int64], Dict[str, int], Int64Index, int32, List[int], Index, float, List[Any], time, Tuple[int, int], Series, Set[str], int64, Interval, Tuple[slice, slice, List[str]], Tuple[slice, int], DatetimeIndex, SparseArray, Tuple[slice, List[int]], Timestamp, List[Timestamp], datetime64, Set[Tuple[str, str]], MultiIndex, datetime, Tuple[str, str], Tuple[slice, List[str]], NaTType, List[bool]]],
        axis: int
    ) -> Optional[Union[float64, List[str], str, ndarray, bytes, RangeIndex, List[Tuple[str, str]], Tuple[slice, slice], Dict[Tuple[str, str], int], Timedelta, bool, Tuple[str, int], Float64Index, int, Tuple[slice, slice, str], TimedeltaIndex, Tuple[int64, int64, int64, int64], Dict[str, int], Int64Index, int32, List[int], Index, float, List[Any], time, Tuple[int, int], Series, Set[str], int64, Interval, Tuple[slice, slice, List[str]], Tuple[slice, int], DatetimeIndex, SparseArray, Tuple[slice, List[int]], Timestamp, List[Timestamp], datetime64, Set[Tuple[str, str]], MultiIndex, datetime, Tuple[str, str], Tuple[slice, List[str]], NaTType, List[bool]]]: ...

I could be wrong, but I assume that test functions that raise are not included in the traces. There wouldn't much point otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no way to restrict this further? Any just doesn't really provide the reader/developer with any more insight into how the function is supposed to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no way to restrict this further?

probably. but not time well spent until more call sites and more imports are typed.

import warnings

import numpy as np
from numpy import ndarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think outside of cython we generally avoid this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will change.

@@ -88,7 +94,7 @@ class _IndexSlice:
B1 10 11
"""

def __getitem__(self, arg):
def __getitem__(self, arg: Any) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these at least not have to be Hashable and/or some type of Mapping / Sequence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

Not prepared to justify each instance of Any. so will just remove them to streamline the process since that seems to be your preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn’t it possible just reveal type these then keep adding to a Union?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably not so many calls to getitem from within the codebase as from the tests. but, yes, that seems a reasonable approach.

""" convert a range argument """
return list(key)

def _convert_scalar_indexer(self, key, axis: int):
def _convert_scalar_indexer(self, key: Any, axis: int) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no way to restrict this further? Any just doesn't really provide the reader/developer with any more insight into how the function is supposed to work.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2019

i wonder if there is a decorator that we could use in tests to turn on reveal_type and show a summary

key = tuple([key] + [slice(None)] * (len(labels.levels) - 1))
list_items = [key] # type: List[Union[slice,str]]
list_items += [slice(None)] * (len(labels.levels) - 1)
key = tuple(list_items)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no happy changing code. the original is perfectly valid. will probably revert this and add a type: ignore and wait for a mypy update.

@WillAyd should we add --warn-unused-ignores to either the CI, or warn-unused-ignores=True to the ini file.

or happy to just run ad-hoc periodically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not reuse key and just call it something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that you can't add a list of slices to a list of strings in an inline expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok that's strange. Let me open an issue on Mypy tracker and we can ref that in ignore comment here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think the slice is a red herring. This fails with just int and str as well:

values: Union[List[int], List[str]] = [1]  + [""]
# error: List item 0 has incompatible type "str"; expected "int"

values: List[Union[int, str]] = [1]  + [""]
# error: Incompatible types in assignment (expression has type "List[int]", variable has type "List[Union[int, str]]")
# note: "List" is invariant -- see http://mypy.readthedocs.io/en/latest/common_issues.html#variance
# note: Consider using "Sequence" instead, which is covariant
# error: List item 0 has incompatible type "str"; expected "int"

So I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good find. Can just add that as comment and ignore (suggested approach from mypy for dealing with these types of things)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good one question

@@ -860,7 +867,7 @@ def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):
axis = self.axis or 0
try:
# fast path for series or for tup devoid of slices
return self._get_label(tup, axis=axis)
return self._get_label(tup, axis=axis) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error was this throwing? And one below. Sorry if missed in diff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis needs to be int but is type Axis.

probably not yet got the return type for self.obj._get_axis_number(axis) L112. This is defined in another module so would rather keep additions to single modules for now.

could add a cast, but that lets you lie to the type checker so can be more dangerous than ignoring for now.

not sure whether mypy will figure this out once more type hints are added.

as long as we keeping using --warn-unused-ignores we should know if these become redundant and can be removed. otherwise will probably need to add cast in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to just type that return then? Understood in a different module but reason I ask is I think there is buggy axis behavior when using strings that mypy may detect for us if we don't ignore.

To illustrate, these go down different code paths though result is equivalent

In [5]: df = pd.DataFrame(np.ones((2,2)), index=pd.MultiIndex.from_tuples((('a', 'b'), ('c', 'd'))))
In [5]: df.loc[("a", "b")]
Out[8]:
0    1.0
1    1.0
Name: (a, b), dtype: float64

In [9]: df.loc(axis="index")[("a", "b")]
Out[9]:
0    1.0
1    1.0
Name: (a, b), dtype: float64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to just type that return then? Understood in a different module but reason I ask is I think there is buggy axis behavior when using strings that mypy may detect for us if we don't ignore.

I accept that we probably have different motivations here.

my motivation is to clean up the io.formats code. there is a lot of code duplication there, and i believe buggy code there as well.

so my view is that to support the refactoring activity, it's better to get, say 80%, of each of the imports to formats typed, than to get 100% of just a few.

so i'd prefer "adding # type: ignore comments to silence errors you don’t want to fix now." https://mypy.readthedocs.io/en/latest/existing_code.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sure. I'm just hesitant that ignore and Any mean we won't ever review in earnest, but sounds like you have a plan here so that doesn't apply

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just hesitant that ignore and Any mean we won't ever review in earnest

I'm sure that in time we will want to increase the strictness of the mypy checks. so hopefully this won't apply.

also, i'm using MonkeyType to annotate with the actual types passed and returned, rather than what we think they should be. This itself is quickly revealing refactoring/cleanup opportunities.

I guess we could categorize bugs into three categories for the purpose of adding type hints.

  1. bugs introduced changing code just to appease the type checker. hence i would prefer ignore to cast, and cast to code changes.

  2. bugs in the codebase. adding type hints is going to help enormously.

  3. bugs that will be added in the future. Again here, i have the opinion that getting, say 80%, of the codebase done quickly is more likely to be a benefit than perfection on the parts that are done.

@WillAyd
Copy link
Member

WillAyd commented Jul 24, 2019

Can you merge master? Just pushed in change to remove ABCs from _typing so should run CI again to double check

@simonjayhawkins
Copy link
Member Author

sure

@jreback jreback added this to the 1.0 milestone Jul 25, 2019
@jreback
Copy link
Contributor

jreback commented Jul 25, 2019

I am with @WillAyd here about the type ignore's. I am afraid that we are never going to revisit these; we did this for a while with flake8 and it was a disaster; so I would either: actually define the type or type somewhat less until this can be fixed.

@simonjayhawkins
Copy link
Member Author

I am with @WillAyd here about the type ignore's. I am afraid that we are never going to revisit these; we did this for a while with flake8 and it was a disaster; so I would either: actually define the type or type somewhat less until this can be fixed.

fair enough. but this defeats the purpose of why I was adding type hints... to make the types of the function parameters and return types known to other functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants