We need some way to identify non-index coordinates #197

shoyer · 2014-08-01T06:36:13Z

I am currently working with station data. In order to keep around latitude and longitude (I use station_id as the coordinate variable), I need to resort to some ridiculous contortions:

residuals = results['y'] - observations['y']
residuals.dataset.update(results.select_vars('longitude', 'latitude'))

There has got to be an easier way to handle this.

I don't want to revert to some primitive guessing strategy (e.g, looking at attrs['coordinates']) to figure out which extra variables can be safely kept after mathematical operations.

Another approach would be to try to preserve everything in the dataset linked to an DataArray when doing math. But I don't really like this option, either, because it would lead to serious propagation of "linked dataset variables", which are rather surprising and can have unexpected performance consequences (though at least they appear in repr as of #128).

This leaves me to a final alternative: restructuring xray's internals to provide first-class support for coordinates that are not indexes. For example, this would mean promoting ds.coordinates to an actual dictionary stored on a dataset, and allowing it to hold objects that aren't an xray.Coordinate.

Making this change transparent to users would likely require changing the Dataset signature to something like Dataset(variables, coords, attrs). We might (yet again) want to rename Coordinate, to something like IndexVar, to emphasis the notion of "index" and "non-index" coordinates. And we could get rid of the terrible "linked dataset variable".

Once we have non-index coordinates, we need a policy for what to do when adding with two DataArrays for which they differ. I think my preferred approach is to not enforce that they be found on both arrays, but to raise an exception if there are any conflicting values -- unless they are scalar valued, in which case the dropped or turned into a tuple or given different names. (Otherwise there would be cases where you couldn't calculate x[1] - x[0].)

We might even able to keep around multi-dimension coordinates this way (e.g., 2D lat/lon arrays for projected data).... I'll need to think about that one some more.

The text was updated successfully, but these errors were encountered:

shoyer · 2014-08-03T00:49:47Z

Some further thinking suggests that we can allow for multi-dimensional coordinates, but the only sane way to handle conflicting non-index coordinates is to drop them. Even Iris, with its strict interpretation of CF conventions, takes this approach.

Raising an exception for conflicting non-scalar variables would make multi-dimensional coordinates impractical.

Relevant for pydata#197

Lays groundwork for pydata#197

shoyer · 2014-08-18T06:10:39Z

Here's my current thinking on implementation:

Dataset._coords_keys is a set that keeps track of the names of variables that are coordinates. This would let us implement Dataset.coords as a dict-like object with very little overhead for lookups.
DataArray.dataset should go away from the public AP (use to_dataset() instead); more importantly, DataArray objects should only know about their coordinates, not any other aspects of the underlying dataset (it's a leaky abstraction).
Dataset.__iter__ only only iterate over non-coordinates; but __contains__ and __getitem__ should be unchanged.
Of course, we should automatically save and load coordinates according to CF conventions.

Also, we need some new methods to make this workable (modeled off of pandas's set_index and reset_index):

Dataset.set_coords(keys, inplace=False) turns the variables in names into coordinates
Dataset.reset_coords(keys=None, drop=False, inplace=False) removes all coordinates and turns them back into variables (unless drop=True).
DataArray.reset_coords() would be very similar to the dataset method; it would return a Dataset unless unless drop=True (in which case it would return another DataArray).
set_index(keys, inplace=False) should be both a DataArray and Dataset method.

An important aspect is that using non-index coordinates (a power user feature) should be optional, just like how using and understanding Dataset objects should be optional.

shoyer · 2014-09-10T06:07:15Z

Still a few items to check off (see the open associated issues) but I don't think they are blockers to v0.3.

shoyer added the API label Aug 1, 2014

shoyer modified the milestones: 0.3, 1.0 Aug 3, 2014

This was referenced Aug 13, 2014

Consistent use of abbreviations: attrs, dims, coords #190

Closed

Checklist for v0.2 release #183

Closed

shoyer added a commit to shoyer/xarray that referenced this issue Aug 14, 2014

Don't allow indexing like DataArray.coords[0]

f5c165b

Relevant for pydata#197

shoyer added a commit to shoyer/xarray that referenced this issue Aug 14, 2014

Require specifying attrs with a keyword argument in Dataset.__init__

ec7be0b

Lays groundwork for pydata#197

shoyer mentioned this issue Aug 21, 2014

WIP: Automatic label alignment for mathematical operations #184

Closed

This was referenced Sep 2, 2014

Nonindex coords #221

Merged

Restructure DataArray internals to not use a Dataset? #117

Closed

This was referenced Sep 10, 2014

set_index(keys, inplace=False) should be both a DataArray and Dataset method. #230

Closed

Save and load coordinates according to CF conventions #231

Closed

shoyer closed this as completed Sep 10, 2014

shoyer mentioned this issue Dec 12, 2014

Use "coordinate variables"/"data variables" instead of "coordinates"/"variables"? #293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We need some way to identify non-index coordinates #197

We need some way to identify non-index coordinates #197

shoyer commented Aug 1, 2014

shoyer commented Aug 3, 2014

shoyer commented Aug 18, 2014

shoyer commented Sep 10, 2014

We need some way to identify non-index coordinates #197

We need some way to identify non-index coordinates #197

Comments

shoyer commented Aug 1, 2014

shoyer commented Aug 3, 2014

shoyer commented Aug 18, 2014

shoyer commented Sep 10, 2014