-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add trapz to DataArray for mathematical integration #1288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't at the moment see a reason to use a different API than
|
I agree that the API should mostly copy the As long as there isn't something else we'd want to reserve the name for, I like the sound of It looks like SciPy implements Simpson's rule with the same API (see scipy.integrate.simps), so that would be easy to support, too. Given how prevalent SciPy is these days, I would have no compunctions about making scipy required for this method and defaulting to It would be useful to have dask.array versions of these functions, too, but that's not essential for a first pass. The implementation of CC @spencerahill @rabernat @lesommer in case any of you have opinions about this |
+1 for The cumulative integral is of very frequent use in atmospheric sciences, too : https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.integrate.cumtrapz.html |
An With a general |
Having an xarray wrapper on I will also make the same comment I always make when such feature requests are raised: yes, it always seems desirable to add new features to xarray on a function-by-function basis. But where does it end? Why not implement the rest of the scipy.ode module? And why stop there? As a community we need to develop a roadmap that clearly defines the scope of xarray. Once |
As usual @rabernat raises some excellent points!
Yes, this is a totally valid concern, if a user might expect One point in favor of calling this
Looking at the rest of
I doubt we'll be able to come up with hard and fast rules, but maybe we can enumerate some principles, e.g.,
|
And I'm fine with |
|
👍 for the functionality (both Another way to look at it is that methods are there to encapsulate some kind of manipulation of internal state or to ensure that some kind of invariant is maintained. I don't see how da.integrate(dim='x', method='trapezoidal') instead of integrate(da, dim='x', method='trapezoidal`) If you want to see what the pathological case of putting everything as a method for convenience looks like, go look at all the plot methods on matplotlib's My real preference would just to have this work: ds = xr.tutorial.load_dataset('rasm')
np.trapz(ds['Tair'], axis='x') but I have no idea what that would take, so I'm perfectly fine with xarray gaining its own implementation. |
I can see the problems down the road that @rabernat brings up. Say you have a high-order finite volume discretization and some numerical implementation of high-order integration for that gridding. What would your interface be? You could write it as That might bring us back to the algorithmically descriptive name |
I would also like to see an def cumtrapz(A, dim):
"""Cumulative Simpson's rule (aka Tai's method)
Notes
-----
Simpson rule is given by
int f (x) = sum (f_i+f_i+1) dx / 2
"""
x = A[dim]
dx = x - x.shift(**{dim:1})
dx = dx.fillna(0.0)
return ((A.shift(**{dim:1}) + A)*dx/2.0)\
.fillna(0.0)\
.cumsum(dim) |
Sorry for letting this lapse. Yes, we absolutely want this functionality in some form.
This is a fair point, and I agree with you from a purist OO-programming/software-engineering perspective (TensorFlow, for example, takes this approach). But with xarray, we have been taking a different path, putting methods on objects for the convenience of method chaining (like pandas). So from a consistency perspective, I think it's fine to keep these as methods. This is somewhat similar even to NumPy, where a number of the most commonly used functions are also methods.
I don't see a big advantage to adding such an extension point. Almost assuredly it's less text and more clear to simply write
I normally don't like adding flags for switching functionality entirely but maybe that would make sense here if there's enough shared code (e.g., simply substituting One thing that can be useful to do before writing code is to write out a docstring with all the bells and whistles we might eventually add. So let's give that a shot here and see if
I could also imagine possibly adding a |
An argument against a single function is that the shape of the returned array is different in each case. Also, cumtrapz has an I this is not a problem, I also like to have one single function for integration (simpler from a user perspective). |
I usually agree that using too many (or any) switches within functions is
not ideal. However, I think this is more important for low level or
internal routines. For user facing interfaces, I think it is okay. After
all, many numpy and scipy functions have convenient switches that control
the return values.
By the way, the cumtrapz implementation I pasted above matches the scipy
version when initial=0, which I also think would be a more sane default for
integration.
As far as implementation is concerned. Is there any performance downside to
using xarrays shift operators versus delving deeper into dask with
map_blocks, etc? I looked into using dasks cumreduction function, but am
not sure it is possible to implement the trapezoid method in that way
without changing dask.
…On Mon, Mar 20, 2017 at 8:48 AM Fabien Maussion ***@***.***> wrote:
An argument against a single function is that the shape of the returned
array is different in each case. Also, cumtrapz
<https://docs.scipy.org/doc/scipy-0.10.1/reference/generated/scipy.integrate.trapz.html>
has an inital keyword which changes the shape of the returned array. It
is currently set to None per default, but should be set to 0 per default
IMO.
I this is not a problem, I also like to have one single function for
integration (simpler from a user perspective).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1288 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABUokrAdpysuufZxHdLSdc1nseH9PtOkks5rnnWYgaJpZM4MOCxc>
.
|
Yes, I agree with both of you that we should fix
From a performance perspective, it would be totally fine to implement this either in terms of high level xarray operations like |
If you give a mouse a cookie, he'll ask for a glass of milk. There are a whole slew of Numpy/Scipy functions that would really benefit from using xarray to organize input/out. I've written wrappers for svd, fft, psd, gradient, and specgram, for starts. Perhaps a new package would be in order? |
I would also be very happy to include many of these in a submodule inside xarray, e.g., |
+1 for integrate. I found this thread when having the same problem. |
Adding my +1 without offering to do the work. :) This would be very welcome! |
Hello. I discovered xarray a few days ago, and find it very useful for my job. Integral along a coordinate is one of few things which I found missing so far. |
@lamorton I really like the suggestion from @shoyer about submodules for throwing wrappers from other libraries, but in the meantime I think I might like very much to check out your implementation of |
@gajomi I can find a place to upload what I have. I foresee some difficulty making a general wrapper due to the issue of naming conventions, but I like the idea too. Edit: Here's what I have so far ... YMMV, it's still kinda rough. https://github.com/lamorton/SciPyXarray |
I would also be very interested in seeing your codes @lamorton. Overall, I think the xarray community could really benefit from some kind of centralized |
I've also contributed to developing a python package (xrft) for fft keeping the awareness of the metadata of multidimensional xarray datasets. |
I opened #1850 to discuss xarray-contrib. |
Since scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral. For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length.
I would guess that in practice, many uses of
mean()
andsum()
are intending to approximate integrals of continuous functions. That is typically my use, at least. But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate.For approximating an integral, it seems to me that the trapezoidal rule (
trapz()
in numpy) should be preferred tosum()
ormean()
in essentially all cases, as the trapezoidal rule is more accurate while still being efficient.It would be very useful to have
trapz()
as a method of DataArrays, so one could write, e.g., for an average value,Tavg = T.trapz(dim='time') / totalTime
. Currently, I would have to use numpy's method and then rebuild the reduced-dimensional array myself:It could even be useful to have a function like
mean_trapz()
that calculates the mean value based on trapz. More generally, one could imagine having other integration methods too. E.g.,data.integrate(dim='x', method='simpson')
. Buttrapz
is probably good enough for many cases and a big improvement overmean
, andtrapz
is very simple even for unequally spaced data. Andtrapz
shouldn't be much less efficient in principle, although in practice I findnp.trapz()
to be several times slower thannp.mean()
.Quick examples demonstrating
sum
/mean
vs.trapz
to convince you of the superiority oftrapz
:This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period.
The text was updated successfully, but these errors were encountered: