Skip to content

implement Gradient #2398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 21, 2018
Merged

implement Gradient #2398

merged 27 commits into from
Sep 21, 2018

Conversation

fujiisoup
Copy link
Member

Added xr.gradient, xr.DataArray.gradient, and xr.Dataset.gradient according to #1332.

* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
>>>
>>> xr.gradient(da, ('x', 'y'))
Copy link
Member Author

@fujiisoup fujiisoup Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altough this API is similar to numpy's counterpart, I'm wondering whether we really need to support this. The return value is a tuple of DataArrays, which I feel inconsistent to other xarray functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote my own wrapper for gradient of a DataArray, I returned a dataset with each variable being a gradient along one dimension, but even that isn't very elegant ¯_(ツ)_/¯

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be another option, but in order to store them to a dataset, we will need to define their names very heuristically...

Maybe we can drop this api as this does not speed up the computation and it's easy to do the same thing manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I don't think it's particularly useful to compute to compute independent gradients with respect to multiple axes at the same time, e.g., du/dx, du/dy, etc. If anything, I would be more excited about computing higher order derivatives, e.g., \frac{d^2 u}{dx dy}

@@ -104,3 +104,49 @@ def func(x, window, axis=-1):
index = (slice(None),) * axis + (slice(drop_size,
drop_size + orig_shape[axis]), )
return out[index]


def gradient(a, coord, axis, edge_order):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be implemented in the upstream...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like this? Requires Dask 0.17.3+ though. So IDK if that works for you or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakirkahm.
I need to compute the gradient with a non uniform coordinate, but dask Array.gradient only supports uniformly spacing coordinate.

I think this can be implemented using overlap as I do here. Is it in the scope of the dask development?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. Should have read more of the code.

Yeah I think that is in scope. Just not currently supported yet.

Could you please open an issue or perhaps PR over on the Dask repo?

def gradient(f, *varargs, **kwargs):
"""
Return the gradient of an N-dimensional array.
The gradient is computed using second order accurate central differences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

"""
Return the gradient of an N-dimensional array.
The gradient is computed using second order accurate central differences
in the interior points and either first or second order accurate one-sides
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (82 > 79 characters)

f : array_like
An N-dimensional array containing samples of a scalar function.
varargs : list of scalar or array, optional
Spacing between f values. Default unitary spacing for all dimensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (81 > 79 characters)

Spacing between f values. Default unitary spacing for all dimensions.
Spacing can be specified using:
1. single scalar to specify a sample distance for all dimensions.
2. N scalars to specify a constant sample distance for each dimension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (82 > 79 characters)

3. N arrays to specify the coordinates of the values along each
dimension of F. The length of the array must match the size of
the corresponding dimension
4. Any combination of N scalars/arrays with the meaning of 2. and 3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)


# Difference of datetime64 elements results in timedelta64
if otype == 'M':
# Need to use the full dtype name because it contains unit information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (82 > 79 characters)

for i, axis in enumerate(axes):
if y.shape[axis] < edge_order + 1:
raise ValueError(
"Shape of array too small to calculate a numerical gradient, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (82 > 79 characters)

else:
dx1 = dx[i][0:-1]
dx2 = dx[i][1:]
a = -(dx2)/(dx1 * (dx1 + dx2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

shape = np.ones(N, dtype=int)
shape[axis] = -1
a.shape = b.shape = c.shape = shape
# 1D equivalent -- out[1:-1] = a * f[:-2] + b * f[1:-1] + c * f[2:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (83 > 79 characters)

else:
dx1 = dx[i][0]
dx2 = dx[i][1]
a = -(2. * dx1 + dx2)/(dx1 * (dx1 + dx2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

@shoyer
Copy link
Member

shoyer commented Sep 4, 2018

I think this will be very welcome functionality!

I wonder if we should consider calling this xarray.differentiate instead of xarray.gradient. I think the NumPy function is poorly named for differentiating along a single axis at once.

depth = {d: 1 if d == axis else 0 for d in range(a.ndim)}
# temporary pad zero at the boundary
boundary = "none"
ag = overlap(a, depth=depth, boundary=boundary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F841 local variable 'ag' is assigned to but never used

boundary = "none"
ag = overlap(a, depth=depth, boundary=boundary)

n_chunk = len(a.chunks[axis])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F841 local variable 'n_chunk' is assigned to but never used

array_loc_start = array_loc_stop - np.array(a.chunks[axis]) - 2
array_loc_stop[-1] -= 1
array_loc_start[0] = 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W293 blank line contains whitespace

return grad

return a.map_overlap(
func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E126 continuation line over-indented for hanging indent

@fujiisoup
Copy link
Member Author

I wonder if we should consider calling this xarray.differentiate instead of xarray.gradient. I think the NumPy function is poorly named for differentiating along a single axis at once.

Agreed. Differentiate is nicer.

Some other api questions arised during the implementation

  • Do we support differentiate for Dataset? In that case, what should we do for the variables that are independent from the target coordinate?
    I thought 'keep them as is' is intuitive (and I implemented so), but mathematically, they should be zero.

  • Do we need to sort the array before computing differentiate? np.gradient implicitly assumes the array is sorted (but do nothing about this).

from numbers import Integral, Real


def validate_axis(axis, ndim):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E303 too many blank lines (2)

if not isinstance(axis, Integral):
raise TypeError("Axis value must be an integer, got %s" % axis)
if axis < -ndim or axis >= ndim:
raise AxisError("Axis %d is out of bounds for array of dimension %d"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F821 undefined name 'AxisError'
E501 line too long (80 > 79 characters)

return axis


def _gradient_kernel(x, block_id, coord, axis, array_locs, grad_kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E303 too many blank lines (2)

return grad


def gradient(f, *varargs, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E303 too many blank lines (2)

@@ -3,7 +3,7 @@

import numpy as np

from . import nputils
from . import nputils, npcompat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '.npcompat' imported but unused

@fujiisoup
Copy link
Member Author

any thoughts for this?

@dopplershift
Copy link
Contributor

Why would you sort the array? Aren't you taking differences of values and dividing by differences between the matching coordinates?

@fujiisoup
Copy link
Member Author

Thanks, @dopplershift .

Aren't you taking differences of values and dividing by differences between the matching coordinates?

Yes, correct. But if we have closer data points, then the estimate of the gradient becomes more precise. Sorting the array according to the coordinate provides the closest points, resulting in the most precise estimate of the gradient.

But I also think users can do it manually before taking the gradient.


if not isinstance(dataarray, DataArray):
raise TypeError(
'Only DataArray is supported. Given {}.'.format(type(dataarray)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here. You wrote an implementation for Dataset.differentiate, too, and here is a duplicate version of DataArray.differentiate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Maybe the top level function is not necessary here and DataArray.differentiate and Dataset.differentiate would be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I don't think we need the separate function.

array = array - np.min(array)
else:
array = array - np.zeros(array.shape, dtype=array.dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W293 blank line contains whitespace

npcompat.gradient(
da, utils.to_numeric(
da['x'], offset=True, time_unit='D'), axis=0, edge_order=1),
dims=da.dims, coords=da.coords)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E131 continuation line unaligned for hanging indent

The coordinate to be used to compute the gradient.
edge_order: 1 or 2. Default 1
N-th order accurate differences at the boundaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs time_unit here

coord_data = coord_var.data
if coord_data.dtype.kind in ['m', 'M']:
if time_unit is None:
time_unit = np.datetime_data(coord_data.dtype)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use tuple unpacking here, e.g., time_unit, _ = np.datetime_data(coord_data.dtype)


@pytest.mark.parametrize('dask', [True, False])
@pytest.mark.parametrize('edge_order', [1, 2])
def test_gradient(dask, edge_order):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long term, hopefully we'll solve this by getting real unit support working, but a time_unit argument seems like a good choice for now

@fujiisoup
Copy link
Member Author

I think it's ready :)

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fujiisoup I'm really looking forward to having this built in to xarray! Thanks for the extra work in making things dask-compatible as well.

Eventually it would be nice if this worked on DataArrays with cftime.datetime coordinates; I think it would be relatively straightforward to modify to_numeric to enable it (we could probably enable it for interp at the same time), but I can take care of that later if you'd like.

@@ -36,6 +36,10 @@ Documentation
Enhancements
~~~~~~~~~~~~

- :py:func:`~xarray.differentiate`, :py:meth:`~xarray.DataArray.differentiate`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think xarray.differentiate is no longer added.

'us', 'ns', 'ps', 'fs', 'as'}
dtype: target dtype
"""
if array.dtype.kind not in ['m', 'M']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like everywhere to_numeric is called, this check is already made. Is it redundant to check again here (or conversely are the checks before the function is called redundant)?


.. ipython:: python
a = xr.DataArray(np.arange(8).reshape(4, 2), dims=['x'],
coords=[0.1, 0.11, 0.2, 0.3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example raises an error when you run it -- I think you need to supply two dimension names.


Xarray objects have some handy methods for the computation with their
coordinates. :py:meth:`~xarray.DataArray.differentiate` computes derivatives by
finite central differences using their coordinates,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"finite central differences" -> "centered finite differences"

@fmaussion
Copy link
Member

This is so great! This is going to simplify my classes about derivatives on the globe even more: my only concern is that students will soon forget what a numerical derivative actually is, and that it's not a trivial to implement ;-)

@shoyer
Copy link
Member

shoyer commented Sep 19, 2018

Speaking of derivatives on the globe, we might want to include an option for periodic boundary conditions (and possibly for other functions like interp as well). But we can save that for another PR :).

@fmaussion
Copy link
Member

But we can save that for another PR :)

See also #1288 : integrate is the next on my list ;) - I can try to give it a go if @fujiisoup doesn't want to do it himself

@jakirkham
Copy link

@shoyer, have you seen da.pad?

@shoyer
Copy link
Member

shoyer commented Sep 19, 2018

@shoyer, have you seen da.pad?

Yes, pad solves the hard part of periodic boundary conditions -- we should just need to do a bit of book-keeping on the xarray side.

@rabernat
Copy link
Contributor

I agree that the features implemented here are broadly useful and belong in xarray. The fundamental question is: how far does xarray itself want to go in supporting vector calculus in curvilinear coordinates (i.e. on the sphere).

There is a fair bit of overlap between this new functionality and some of the things that we are trying to do in xgcm: https://xgcm.readthedocs.io/en/latest/grids.html. Xgcm supports periodic boundary conditions, as well as more complex topological connections between array edges. It allows users to reproduce precisely the sort of operations used to take gradients in finite-volume staggered-grid models.

@shoyer
Copy link
Member

shoyer commented Sep 19, 2018

@rabernat my inclination would be to draw the line at regular grids (with or without periodic boundary conditions), which are pretty commonly encountered in data analysis for any continuous physical systems. We would leave non-regular cases like staggered-grids to add-ons like xgcm -- these grids tend to be more application/PDE specific.

@shoyer
Copy link
Member

shoyer commented Sep 19, 2018

We should definitely mention specialized tools like xgcm as alternatives in the docstring for these xarray methods.

@fmaussion fmaussion mentioned this pull request Sep 19, 2018
@fujiisoup
Copy link
Member Author

Thanks all. Updated.

Thanks, @spencerkclark

Eventually it would be nice if this worked on DataArrays with cftime.datetime coordinates; I think it would be relatively straightforward to modify to_numeric to enable it (we could probably enable it for interp at the same time), but I can take care of that later if you'd like.

Thanks. I added this function for something like this extension, though I do not yet fully follow your cftime update. It would be super nice if you could take care of this after merge.

@shoyer ,

we might want to include an option for periodic boundary conditions

Agreed. This option is nice not only differentiate but also interp and rolling.
I think we can add a common logic to take care of them.

@fmaussion

See also #1288 : integrate is the next on my list ;) - I can try to give it a go if @fujiisoup doesn't want to do it himself

Thanks.
Actually, I am now moving to another nation and would not have enough time in a few weeks.
I will appreciate if you could take care of this.

@spencerkclark
Copy link
Member

I added this function for something like this extension, though I do not yet fully follow your cftime update. It would be super nice if you could take care of this after merge.

I totally understand; cftime objects probably seem fairly esoteric to non climate scientists. I'll be happy to take care of this after this gets merged. Indeed having the relevant logic already contained in to_numeric is very convenient.

@shoyer
Copy link
Member

shoyer commented Sep 19, 2018

I'm going to merge this after the Appveyor tests pass, unless there are any further objections.

@shoyer
Copy link
Member

shoyer commented Sep 20, 2018

@rabernat let me know if this makes sense to you / if you agree.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm late to the party on reviewing this. It looks like a great feature, and I know @fujiisoup has put a huge amount of work into it! 👏

My only concern is that the first thing most geoscience users will try with this function is to take a field with lat, lon dimensions and call differentiate on it, expecting to obtain the zonal and meridional gradients.

I feel that some mention is needed in the documentation that this feature is limited to simple cartesian geometry, in which the physical locations of the variables are accurately described by the dimension coordinates.

coords=[0.1, 0.11, 0.2, 0.3])
a.differentiate('x')


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a place where we could mention the limitations of differentiate and gradient for non-cartesian geometry.

@@ -2289,6 +2292,57 @@ def rank(self, dim, pct=False, keep_attrs=False):
ds = self._to_temp_dataset().rank(dim, pct=pct, keep_attrs=keep_attrs)
return self._from_temp_dataset(ds)

def differentiate(self, coord, edge_order=1, datetime_unit=None):
""" Differentiate the array with the second order accurate central
differences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

@@ -3663,6 +3666,58 @@ def rank(self, dim, pct=False, keep_attrs=False):
attrs = self.attrs if keep_attrs else None
return self._replace_vars_and_dims(variables, coord_names, attrs=attrs)

def differentiate(self, coord, edge_order=1, datetime_unit=None):
""" Differentiate with the second order accurate central
differences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

@fujiisoup
Copy link
Member Author

Thanks, @rabernat for the review.

Added the limitation of this method to docs.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants