-
Notifications
You must be signed in to change notification settings - Fork 629
Type annotation considerations #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi! I think we have a different focus here, and not all of what you stated as fact is correct, so I’ll do my best to clear this up:
I think that should cover it. I’m awaiting your comment about |
Hey, this has been something that's been confusing me a bit when annotating my arguments. Since python is pretty polymorphic (until its not), I find it hard to capture the traits an object should have using types I'm familiar with. Some examples:
What's are the correct typings for these? Do I do a Union of everything I can think of that matches this? Is there a way to say: "should behave right if I call I guess I'd like to so some information on best practices and common idioms in the contribution guide. I haven't seen too many scientific python packages use type annotations, so I'm not sure how set conventions are. If anyone has seen some good writing on type annotations for the scientific python stack, I'd love to take a look. |
Hi! There’s a series of abstract base classes (other languages call them interfaces) that can be used if you know specifically what you want (e.g. About your examples, generally I always have to dig into the code to figure such things out. Annoying, but it means that people after me can just use the type annotations instead of wasting their time doing the same.
In truth, Number = Union[float, int, np.integer, np.floating]
Num1DArrayLike = Sequence[Number]
Num2DArrayLike = Sequence[Num1DArrayLike]
Num3DArrayLike = Sequence[Num2DArrayLike]
NumNDArrayLike = Union[Num1DArrayLike, Num2DArrayLike, Num3DArrayLike] But if we want to be exact about class ArrayLike(ABC):
"""An array,
any object exposing the array interface,
an object whose __array__ method returns an array,
or any (nested) sequence.
"""
@classmethod
def __subclasshook__(cls, C):
if issubclass(C, np.ndarray):
return True
if any('__array_interface__' in B.__dict__ for B in C.__mro__):
return True
if any('__array__' in B.__dict__ for B in C.__mro__):
return True
return Sequence.__subclasshook__(cls, C) Two thoughts here:
|
@flying-sheep Regarding your first thought... it may cause issues when interfacing with other functions that do not have type annotations on the arguments. And users may then find it difficult to interpret the errors. |
…no? why would it? type annotations are only used for people and IDEs (unless you use mypy in your tests to check if everything is sound) |
Assuming I understand typing correctly... I'm imagining this in the same way that anndata's copy function always cast to |
How? As said, they’re just for people and IDEs. Scanpy doesn’t use them. It doesn’t throw errors in case something doesn’t fit. We could use https://pypi.org/project/typecheck-decorator/ to throw errors when something is passed that doesn’t fit the annotations. However, doing so has a performance hit and requires flawless annotations (because if the annotations were wrong, that would start suddenly throwing errors) I’m just adding type annotations to improve user friendliness by being more clear what functions accept, and because it makes writing documentation easier. |
In that case, I don't fully understand this typing and will just continue reading quietly ;). I assumed it would throw errors as for example in C++. |
There’s a few uses:
i’m not planning to do 3 and 4 (yet, and probably never) |
@flying-sheep Thanks for the thorough response! This is a topic I have a lot of thoughts on, though I'm not so sure how coherently I can communicate all of them. On your first thought: The worst case scenario I see here me typing something so poorly a newbie trying to follow the documentation gets horrible numba errors they can't figure out. This could happen if I hadn't thought about >>> issubclass(np.ndarray, typing.Collection)
True
>>> issubclass(np.ndarray, typing.Sequence)
False On defining Again, I think these would be less of an issue if quality writing on type annotation usage, particularly for scientific python, was available. As a Julia user, I found this blog post very helpful not just for understanding how to implement trait types in Julia, but also when they're useful. |
@flying-sheep As always, thank you for your thorough thoughts on the topic! And as always, my "hacking-numerics" perspective likely is not a path that is long term sustainable. With what I wrote at the very beginning of this thread, I simply wanted to express that I thought that we shouldn't transition quickly and immediately; for the cosmetic reasons and for the reason of staying away from creating entry hurdles. I still don't think that scanpy needs to precede major packages like numpy and many others in adapting type annotations. But, in essence, I trust you and if you want to push this further I'm fine if scanpy becomes somewhat a field of experimentation for how to deal with type annotations in scientific and numerics-centered software. @ivirshup Thank you very much for your remarks, too! I agree with your concerns and examples, but wouldn't have been able to summarize them as neatly. Conclusion: @flying-sheep if you feel you have bandwidth for improving the cosmetics (thanks for what you did already, also the PR to ipython) that lead to more homogeneous docstrings (I'd say: PS: Thanks for the hints about Jedi etc. @flying-sheep. But likely, I'll keep playing around and reading documentation of packages using shift-tab in jupyter and develop using emacs relatively plain (there were times when I worked with quite some extensions, but these days, I'm back to almost plain for performance reasons - I know that's probably not smart, but anyways)... |
Well, that’s an improvement over the current situation of “the freeform text type annotations make me guess what I can pass and I get horrible numba errors”, right?
That looks like a bug. The docs to >>> np.ndarray.__new__
<function ndarray.__new__(*args, **kwargs)>
>>> np.ndarray.__getitem__
<slot wrapper '__getitem__' of 'numpy.ndarray' objects>
>>> np.ndarray.__len__
<slot wrapper '__len__' of 'numpy.ndarray' objects> |
Got it! so no “move fast and break things” but instead to identify problems and fix them before they occur. I think the most painful issues here are
Will do, but a comma is ambiguous, as it could mean union, intersection, or (in Python) tuple. I think
good call! I might just edit them in-PR as I did to fix the colormaps in @fidelram’s last PR. |
Great! One last thing: In docstrings, why would you interpret a comma separated list as intersection or a tuple? This is not code but for humans. I'm even having a hard time to imagine the case that gives rise to an intersection. Also, a tuple in a docstring should always be verbose with Right now, the convention across all the major packages is to simply print out a comma separated list of types if you are allowed to pass different types to a parameter. This produces the least amount of visual distraction and maintains consistency for how it's done in Scanpy in the manual docstrings and everywhere else. If there is a case where an intersection is relevant, I'd treat that separately. Finally: |
Ah, we already have a contributing sheet, but we should probably directly link to it from the readme. Do you want to edit it, before doing so, @flying-sheep? And yes, ipython/ipython#11505 is awesome! |
I created legacy-api-wrap. Only caveat: Scanpy is still 3.5 compatible, and I’m using f string syntax in it and its dependency get_version, so it’s python 3.6 only (which could be circumvented via future-fstrings or so) |
In natural language “a, b, c” usually means “a, b, and c” (i.e. a composite or a logical intersection). And an intersection type is one that has all the attributes of all the types, like in It took me a long time to find a numpy function that uses commas for anything other than the “, optional”, but of course you’re right. They do it like that. Why don’t people think before establishing conventions… A good example of that function’s docs is also how braindead the “optional” is: for
oh, is this visible? or does it need to be uppercase for that? CONTRIBUTING.md? I don’t see it when creating an issue, but maybe because I’m an organization member? |
... I know all that. 😉 But numpy, pandas, scikit learn, tensorflow, seaborn all have the comma-separated list as a convention and I'd really like to stick to that convention.
No, the So, let's simply take the comma-separated list. |
Hm, good point. I never wondered. But yes, you might be right. |
And I know that you know! I just like to be comprehensive when presenting my arguments!
OK. I’d prefer “a, b, or c”, but I’ll concede. It would also be no problem to change it later since all will be automated 👍
Well, when I open scanpy in PyCharm and someone forgot that in a type annotation, it highlights that fact to me. Pretty nice.
Oh, then you didn’t hear of type theory. It’s a branch of logic: Type systems are formal systems, and in most of them the terms I used are well defined. The kinds of composite types I mentioned are:
|
Indeed, I have never heard of that. But I doubt that it would be considered a branch of logic.
What do you think about 3, @flying-sheep? |
What do you define as logic here? I was talking about the logic theory that encompasses formal systems and so on.
I agree, wikipedia enumerates more names, and explains where “union” comes from:
I think “discriminated union/intersection of types” would make sense here. leaving out the “discriminated/tagged/disjoint” here is the problem. in C there’s actual untagged unions, which simply means that C reserves the memory for the largest of the intersected types and you need to keep track yourself of which the type of the value is. In python you can always do And intersections are basically duck types or structural types (when anonymous) and traits/interfaces (when named). (i.e. |
just passing by to say I love seeing this discussion, and particularly the type-algebra perspective @flying-sheep 😀 I have to admit I am missing what the original context was, in case anyone wants to attempt a small summary. Something about describing types in docstrings (e.g. |
sure! in short: alex said he didn’t like the switch to type annotations at all, citing a few gripes. i went on to fix them at various places (fixes are now in) and argued against a few others. i convinced alex that we should (slowly and carefully) adapt type annotations. the only thing that was missing is a consensus on how to best pretty-print from there on we went deeper into algebraic types and so on. without need really, as we already decided on what to do. |
I love this discussion, too!
|
Logic as the mother of all formal reasoning and its close relative set theory in mathematics. When you say type theory is a branch of logic then 90% of computer science is a branch of logic. In many contexts this might be a valid but not a very useful statement.
I love
This is what I meant when I said intersection of properties of supertypes. But I still don't know when you'd need such a type in a practical context, given that we just keep overloading functions like crazy and simply treat passed arguments dependent on their type. Any example when intersection types are actually useful? In a function we might see in Scanpy (this was the whole beginning of this discussion; I cannot imagine a case in which we need to label something intersection type in the docs). |
Just catching up on this conversation after a conference, glad to see the interest! @falexwolf, I think I can add some context here. Within type theory, these operations with types are actually set operations on a type lattice. For example, a supertype is the union of its subtypes. This is a core feature of type systems. If you're curious about the details, I really like this paper defining abstract interpretation -- which type algebra is a case of. Here's an example I think is pretty cool. Julia's # Types are capitalized
# `<:` is the subtype operator
# Distributions are parametric on their variate form and value support
Distribution{F<:VariateForm, S<:ValueSupport}
# Defining some abstract subtypes
DiscreteDistribution{F<:VariateForm} = Distribution{F<:VariateForm, Discrete} # Discrete is a subtype of ValueSupport
UnivariateDistributions{S<:ValueSupport} = Distribution{Univariate, S<:ValueSupport} # SingleVariate is a subtype of VariateForm
MultivariateDistribution{S<:ValueSupport} = Distribution{Multivariate, S<:ValueSupport} # SingleVariate is a subtype of VariateForm
# Defining an intersection type
DiscreteUnivariateDistribution = Distribution{Univariate, Discrete}
# `===` is the absolute identity operator
DiscreteUnivariateDistribution === typeintersect(DiscreteDistribution, UnivariateDistribution) # returns `true`
# Examples of `Distribution`s which are subtypes of `DiscreteUnivariateDistribution`, these return `true`
Hypergeometric <: DiscreteUnivariateDistribution
Bernoulli <: DiscreteUnivariateDistribution Drawing out (crudely) part of the type lattice (edges denote subtype relationship, with supertypes towards the top):
That said: Julia was designed to make reasoning about types straight forward, and this has limited application to Python. An example I could think of is needing key value lookup which is also ordered could be thought of as the the intersection of |
Thank you for the additional explanations, @ivirshup!
OK, I can imagine that; one just needs to define what a type lattice is exactly. But let's not get into that, I roughly picture what is...
The concept of an intersection type as such makes a lot of sense to me, but as mentioned above, I'd see this as a special form of subclassing using the intersection of properties instead of the union. And yes, your example is nice.
I get that. I can easily imagine more examples. But I cannot imagine why you would ever need to explicitly express that in the Scanpy docs: listing via |
I agree that there's no need for it, I just like this stuff :). Just to be sure I get what you're saying: For the ordered map example, is your position that since we can say something like |
People are discussing intersections here. And I agree with you @ivirshup: That’s a great example where an intersection would be needed. Unions are only useful if you accept several things and somewhere switch behavior based on what you got like I think that Alex just means that something like that isn’t needed anywhere in scanpy. An aside about switching behavior based on types: Too bad Python hasn’t been designed with destructuring |
When I read
in the context of
then My point is (repeating what Philipp said): in practice (in all the numerical stuff that I've done so far, including Scanpy), I have never encountered the need for defining such an intersection object on the typing level. I just overload functions using @ivirshup You didn't explain the "type lattice": but according to what I learned about |
@flying-sheep, I'm pretty sure the logical conclusion of any long discussion about types is that everything should be done in Haskell. I don't like the use of branches with @falexwolf, I completely agree with you on "what makes a good docstring". The knowledge overhead for numeric python doesn't include type theory, so the docs should be interpretable without them. Ideally, interfaces are simple and the documentation makes the expected behavior clear. I'm still not sure I totally understand what the intent of the "type" vs. "class" system is in python, so I'm often a little unsure what to do with heavily typed code. That said, if expected behaviors could be encapsulated (both formally and intuitively) with some abstract types (representing interfaces or traits) that would be a nicer solution. I don't think we're near that point in python. LatticesSorry about not giving some info on lattices, I'd thought you didn't want to get into it. It's the partially ordered set kind of lattice, where each type is an element or subset. I'll give a short python based example (ignoring that The code:from typing import Any, Union
class A():
pass
class B(A):
pass
class C(A):
pass
class D():
pass
class E(D):
pass that defines a lattice, which can be represented as a DAG like this:
It's partially ordered in that you can't say A contains E or vice-versa, but you can say things like A is contains B, and I think that how you're viewing it is pretty close, except the elements are types instead of their properties. My mental model has types being a collection of properties, and being a subtype means an object inherits it's supertypes properties, and can have more. |
That’s what ABCs are for. |
Btw: IPython 7.2 is live and contains my signature rendering prettificattion… except that I made a mistake and it renders wrong. Anyway, I fix that in ipython/ipython#11525 and things should be pretty in IPython 7.2.1: |
Are you talking about the
Additionally, those functions just throw an error for subscripted generics, so you definitely can't do |
I only just now got the distinction between types and classes in python. So when they talk about “types”, they mean stuff in the typing module, got it. So
where a mixin is simply a regular class that happens to rely on some properties of the class it can be mixed with, and a regular class being any class that’s not a type or an ABC.
Check out the docs for abstract base classes, they explain how ABCs work. (namely by Mixin example: class EnumerableMixin:
"""silly mixin class for iterables"""
def enumerate(self, start=0):
yield from enumerate(self, start)
class EnumerableList(list, EnumerableMixin):
pass
for i, e in EnumerableList.enumerate(): print(i, e) ABC example: class PositiveNumbers(collections.abc.Set):
def __contains__(self, i):
return isinstance(i, int) and i >= 0
def __iter__(self): return itertools.count()
def __len__(self): return float('inf')
# __lt__ is mixed in!
print({0, 1, 10_000} < PositiveNumbers())
# `set` doesn’t inherit from collections.abc.Set, the __subclasshook__ does its magic here
isinstance({}, collections.abc.Set) |
I've played around with ABCs for defining interfaces, but had thought the As far I can tell, it's for subscripted (parametric?) annotations. For many python classes those would have to be a runtime check – e.g. |
the runtime checks would be too costly or impossible. to test for |
@falexwolf wrote:
The text was updated successfully, but these errors were encountered: