I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (numpy/numpy#7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).
To help guide discussion, I wrote a document outlining ideas for array shape typing.
To summarize:
- We would like to be able to type-check both data types (e.g.,
float64) and shapes (e.g., a 3x4 array) for multi-dimensional arrays.
- There are many uses cases where support for checks using dimension identity would be valuable, e.g., to indicate that a function transforms an array with shape
(N, M) to shape (N,) for arbitrary integers N and M. These dimension variables look very similar to TypeVar, if TypeVar supported integers as types.
- A notion of "zero or more additional dimensions" would also be quite valuable, and is a core part of the type for many NumPy operations (generalized ufuncs). This might be naturally written with Ellipsis, e.g.,
(...., N) for an array with a last dimension of length N and any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.
This will likely require some new typing features (as well as type-checker support). Notably:
I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (numpy/numpy#7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).
To help guide discussion, I wrote a document outlining ideas for array shape typing.
To summarize:
float64) and shapes (e.g., a 3x4 array) for multi-dimensional arrays.(N, M)to shape(N,)for arbitrary integersNandM. These dimension variables look very similar toTypeVar, ifTypeVarsupported integers as types.(...., N)for an array with a last dimension of lengthNand any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.This will likely require some new typing features (as well as type-checker support). Notably:
array.sum(axis=0).NDArray[N]andNDArray[N, M].DimensionVaras described in my doc).