Skip to content

Proposal to standardize element-wise arithmetic operations #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kgryte opened this issue Jul 9, 2020 · 20 comments
Closed

Proposal to standardize element-wise arithmetic operations #9

kgryte opened this issue Jul 9, 2020 · 20 comments

Comments

@kgryte
Copy link
Contributor

kgryte commented Jul 9, 2020

Based on the analysis of array library APIs, we know that performing element-wise arithmetic operations is both universally implemented and commonly used. Accordingly, this issue proposes to standardize the following arithmetic operations:

Arithmetic Operations

  • add
  • subtract
  • multiply (mul)
  • divide (div)

Criterion

  1. Commonly implemented across array libraries.
  2. Commonly used by array library consumers.
  3. Operates on two arrays.

Questions

  1. Naming conventions? Currently, this proposal is biased toward verbose API names following NumPy.
  2. Are there any APIs listed above which should not be standardized?
  3. Are there basic arithmetic operations not listed above which should be standardized? Preferably, any additions should be supported by usage data.
@kgryte
Copy link
Contributor Author

kgryte commented Jul 20, 2020

I compiled generalized signatures (with respect to each of the above listed interfaces for each library) for element-wise arithmetic operations, where the raw signature data can be found here.

NumPy

numpy.<name>(x1, x2, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj]) → ndarray

CuPy

cupy.<name>(x1, x2, out=None, dtype=None) → ndarray

dask.array


JAX

jax.numpy.<name>(x1, x2) → ndarray

MXNet

np.<name>(x1, x2, out=None, **kwargs) → ndarray

PyTorch

torch.<name>(input, other, out=None) → Tensor

Tensorflow

tf.math.<name>(x, y, name=None) → Tensor

The minimum common API across most libraries is

<name>(x1, x2, out=None)

For example,

add(x1, x2, out=None)

Proposal

Signature of the form:

<name>(x1, x2, *, out=None)

APIs:

add(x1, x2, *, out=None)
subtract(x1, x2, *, out=None)
multiply(x1, x2, *, out=None)
divide(x1, x2, *, out=None)

Notes

Optional arguments as keyword-only arguments for the following reasons:

  1. Avoid potential positional variation amongst library implementations.
  2. Favor explicit interfaces and minimize readers' need to intuit an optional positional argument's meaning.

@kgryte
Copy link
Contributor Author

kgryte commented Jul 30, 2020

See #12 for a draft specification.

@shoyer
Copy link
Contributor

shoyer commented Jul 30, 2020

Meta note: it might be more descriptive to call these "binary arithmetic operations". An operation like abs(x) or -x is also arguable "arithmetic".

My discussion about positional-only arguments and out from #8 is equally relevant here -- we should make a shared decision for both.

@kgryte
Copy link
Contributor Author

kgryte commented Jul 30, 2020

@shoyer Re: naming. The operations in this proposal are not intended to be exclusive. My intent with the issue name was simply to distinguish from the other proposals, but point taken and informs how we might categorize APIs upon formal inclusion in the specification.

@kgryte
Copy link
Contributor Author

kgryte commented Jul 30, 2020

And agreed regarding out.

@rgommers This may be good topic of discussion to add to the agenda for the next meeting.

@shoyer
Copy link
Contributor

shoyer commented Aug 19, 2020

Taking a step back: do need these binary arithmetic operations at all when we have access to Python's infix operators?

I'm glad we have codified these semantics, but perhaps it would suffice to recommend using either +/+= or operator.add/operator.iadd rather than defining your own add function? At the very least we should formalize the relationship between these.

@rgommers
Copy link
Member

That's a good point. I personally can't remember having ever used np.add over +.

Did a quick search of the SciPy code base, and the only uses of np.add are using np.add.reduce, and that's because at some point in the past np.add.reduce was significantly faster than np.sum.

That said, PyTorch uses torch.add all over the place, also without out or another keyword. It's typed more strictly than __add__, so there's probably a reason.

@shoyer
Copy link
Contributor

shoyer commented Aug 19, 2020

That said, PyTorch uses torch.add all over the place, also without out or another keyword. It's typed more strictly than __add__, so there's probably a reason.

tensorflow.add() is also used all over the place. As far as I can tell it's not for any particularly good reason -- mostly just copied from some early examples, possibly examples written by coders who didn't know Python terribly well. It does have an optional name parameter, but that's rarely used.

In theory, tf.add() does have slightly stricter typing semantics (everything gets converted into a Tensor), but TensorFlow is in the process of adding its own dispatch system with __tf_dispatch__ so this will also change.

I know PyTorch also has experimental dispatch, so I suspect the situation there could be pretty similar.

@kgryte
Copy link
Contributor Author

kgryte commented Aug 20, 2020

Downstream libraries do use, e.g., numpy.add (see here and here). An example of using numpy.add in pandas test usage can be found here.

Apart from stricter typing semantics, functional equivalents may be preferred over operator equivalents for purposes of fluent interfaces, lazy evaluation, etc, so I might advise against recommending operator equivalents and not standardizing element-wise arithmetic operation interfaces. These interfaces are widely implemented among analyzed array libraries.

@shoyer
Copy link
Contributor

shoyer commented Aug 20, 2020

Downstream libraries do use, e.g., numpy.add (see here and here). An example of using numpy.add in pandas test usage can be found here.

This test is explicitly checking overrides of NumPy's ufuncs (which np.add is), so I don't think this is a great example.

Apart from stricter typing semantics, functional equivalents may be preferred over operator equivalents for purposes of fluent interfaces, lazy evaluation, etc,

operator.add(a, b) from Python's standard library is a builtin function that is exactly equivalent to a + b. We don't need something new for that.

These interfaces are widely implemented among analyzed array libraries.

True, but many of these libraries, including CuPy, Dask and JAX, try to copy the NumPy interface exactly. A function like add() existing may be more of an indication that it was easy to add than an indication that it is actually useful.

(Note that your list is missing dask.array.add, but that does seem to exist.)

@kgryte
Copy link
Contributor Author

kgryte commented Aug 20, 2020

@shoyer I don't see dask.array.add in its docs. Perhaps you can point to where I can find it? If it exists (apart from being a dunder method), I'd like to update the comparison data.

Re: pandas examples. I simply pulled one usage from a GitHub search. Other examples include here and here. You may be able to find others, or they may all fall into the same category. All this to say is that downstream libraries do use these functional equivalents, as evidenced by the record data where all downstream libraries we've analyzed (pandas, dask.array, xarray, scikit-image, matplotlib) invoked add, subtract, divide, and/or multiply when we've run their test suites.

Re: operator.add. Aware.

Re: other array libraries. Would be good to have some record data for API consumption beyond NumPy. But this is still a WIP.

@kgryte
Copy link
Contributor Author

kgryte commented Aug 20, 2020

As a further comment, I think its worth reiterating the main stated goal of the consortium which is to coalesce around what is presently implemented across array libraries. Meaning, we aren't writing an array library spec from first principles.

As such, we'd need to ask, if we left add, subtract, multiply, and divide out of the spec, what would be the desired outcome? Would NumPy simply drop support for element-wise arithmetic interfaces (meaning that numpy.add would no longer exist)? Or would NumPy retain these interfaces? And if retained, would other array libraries continue to provide such interfaces (potentially following NumPy's lead)? If the answer to the last two questions is "yes", then what we after here is to ensure uniformity amongst the array libraries so that users/devs can expect similar signatures as they move from library to library.

I recognize @shoyer that your concern is forward looking. You'd rather not impose undue burdens on future array libraries. However, unless most/all the currently analyzed array libraries remove these interfaces, we're left with de facto standardization, rather than de jure, and without the consistency guarantees afforded by specification compliance.

@saulshanabrook
Copy link
Contributor

saulshanabrook commented Aug 20, 2020

Well one advantage of not having them included, even if all array libraries continue to provide them, is that then downstream users won't use them if they are writing to the spec. So it could have some influence there, if we think users should be doing a + b over np.add(a, b), because it's more Pythonic and there should be one and only one way to do things?

@shoyer
Copy link
Contributor

shoyer commented Aug 20, 2020

@shoyer I don't see dask.array.add in its docs. Perhaps you can point to where I can find it? If it exists (apart from being a dunder method), I'd like to update the comparison data.

It may not be documented, but it definitely exists:

In [4]: import dask.array

In [5]: dask.array.add(1, 1)
Out[5]: 2

In [6]: type(dask.array.add)
Out[6]: dask.array.ufunc.ufunc

As such, we'd need to ask, if we left add, subtract, multiply, and divide out of the spec, what would be the desired outcome? Would NumPy simply drop support for element-wise arithmetic interfaces (meaning that numpy.add would no longer exist)? Or would NumPy retain these interfaces?

NumPy is certainly not going to drop numpy.add, regardless of what we decide here, but I don't think should be remotely relevant for our decision making here.

There is guaranteed to be a large list of functionality in all of these array libraries that doesn't get standardized. If we insist that libraries remove existing functionality, this spec would never get off the ground.

If our spec is open to extensibility, then I can see a case for allowing optional functions for arithmetic, e.g,. so TensorFlow can expose the optional name parameter. And certainly if a project like TensorFlow makes an add() function, its default behavior should be guaranteed to exactly match +.

@shoyer
Copy link
Contributor

shoyer commented Aug 20, 2020

As a further comment, I think its worth reiterating the main stated goal of the consortium which is to coalesce around what is presently implemented across array libraries. Meaning, we aren't writing an array library spec from first principles.

It would be worth stating this clearly somewhere. I couldn't it in the announcement blog post or in any of the written documents.

I agree with not inventing APIs from first principles, but I think it would be a mistake to blindly copy redundant features just because they exists in all of the libraries investigated. In my opinion, if there are no use-cases aside from backwards compatibility, it shouldn't exist in our standard.

@kgryte
Copy link
Contributor Author

kgryte commented Aug 20, 2020

Re: removing APIs. My point was not that this is what we'd advocate for, but that these arithmetic functions will live on and continue to be implemented (possibly due to prior art, as is the case now), whether or not we include these functions in the spec due to their "fundamental" nature. If our stance is that the spec is strictly about what "ought" to be the standard (prescriptive) instead of codifying (and, in some cases, normalizing) the fundamental APIs which already exist (descriptive), we miss the opportunity to ensure uniform expectations across array libraries.

Re: large list of functionality. Agreed and recognized. Note, however, that the operations listed here, while arguably redundant, were included because (a) they are implemented across all analyzed array libraries and (b) are used by downstream libraries according to our data. If they had no usage, they would not have been proposed for specification.

It is possible that, if add et al, were not in NumPy, no other array library would include them, and it is possible that all array libraries view the operations as redundant (and only grudgingly implemented them), but I would prefer to see data confirming this. Array libraries were never obligated to follow NumPy, and certainly, many array libraries (e.g., Tensorflow and Torch) have chosen to chart their own paths, so I am hesitant to subscribe to the notion that every array library "blindly" copies redundant features just because NumPy has them. In which case, I'd like to hear more feedback from the respective libraries about their motivations for including these APIs. Given that they are widely used by downstream libraries (at least according to record data), I would like to assume the position of (devil's) advocate on behalf of these users and hear the positive case for inclusion.

Re: backward compatibility. This is actually something deserving of a larger discussion. Both in terms of current state of the art, as well as the future evolution of the specification. Personally, I take a much stronger stance in favor of backward compatibility, even if it entails redundant functionality, in order to minimize harm. Meaning, backward compatibility can be reason enough. But this is something we must collectively discuss and decide.

Re: stated goal. Perhaps not succinctly stated as such, but this goal runs through the blog post, particularly in describing "what is an API standard", "Be conservative in choices made", and "A data-driven approach", and is discussed again in the tooling repo README.

@rgommers
Copy link
Member

I agree with not inventing APIs from first principles, but I think it would be a mistake to blindly copy redundant features just because they exists in all of the libraries investigated. In my opinion, if there are no use-cases aside from backwards compatibility, it shouldn't exist in our standard.

I do tend to agree with this.

I think one very important point that we need to emphasize is what it means for something to not be included in the spec. We had the same discussion with complex dtypes, where Travis thought it was a big issue, but really it doesn't matter all that much. All it indicates is "you cannot write code that will work the same with all array libraries using complex dtypes at this point in time".

In this case, there is some harm, but it's very minor. No library is going to remove anything, it just means that in case you have a way of retrieving only what's in the standard (which I hope all libraries will add), add won't be included. No functionality is lost, if the signature is only add(x1, x2, out=None). So the one thing people may have to do is, when trying to make existing code spec-compliant, to change some lines of code from add(x1, x2 to x1 + x2.

That said, there is some overhead in operator functions:

In [1]: import operator                                                                                                  

In [2]: x1 = np.arange(10)                                                                                               

In [3]: x2 = x1                                                                                                          

In [4]: %timeit operator.add(x1, x2)                                                                                     
535 ns ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit np.add(x1, x2)                                                                                           
476 ns ± 8.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit operator.add(x1, x2)                                                                                     
530 ns ± 2.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit np.add(x1, x2)                                                                                           
471 ns ± 4.46 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I don't think anyone is going to care or even notice this 50 ns though, given that the overhead on numpy ufuncs is an order of magnitude larger, torch.add several times more even - I suspect tensorflow et al. are in a similar situation.

So the one situation where the API gets worse is when people write:

for op in (np.sum,
           np.mean,
           ...
           np.add):
    ....

There operator.add would look uglier.

In theory, tf.add() does have slightly stricter typing semantics (everything gets converted into a Tensor), but TensorFlow is in the process of adding its own dispatch system with __tf_dispatch__ so this will also change.

Thanks, interesting pointer for TensorFlow! Regarding the decision here, I'm not sure dispatch matter. The dispatcher should still end up in the correct __add__ or __radd__ method, just like when using +.

tl;dr I'm not convinced either way yet.

@saulshanabrook
Copy link
Contributor

Thanks, interesting pointer for TensorFlow! Regarding the decision here, I'm not sure dispatch matter. The dispatcher should still end up in the correct add or radd method, just like when using +.

I don't think this has been noted explicitly yet, but I think another difference is that np.add can coerce the output type to an array, regardless of input types, whereas operator.add might not. i.e. if both my inputs may either be ints or arrays, then the behavior will be different between the two functions.

It seems like you could can kind of get around this by wrapping the result in an asarray or whatever the method is to create an array from a primitive type?

@kgryte
Copy link
Contributor Author

kgryte commented Aug 20, 2020

@shoyer Thanks for all your input on this. Your feedback has been really helpful in helping clarify thinking and ensure we consider the broader picture.

Based on discussion in today's meeting, beyond saying that the results of x1 + x2 should exactly match the results of add( x1, x2 ), is there anything else you feel the spec should include regarding the relationship between the functional APIs and their equivalent operators? Do you have particular phrasing in mind?

@rgommers
Copy link
Member

I added an entry to the tracking issue for Python operators, dunder methods and equivalent library functions/methods.

The rest is done here, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants