-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
We need some way to identify non-index coordinates #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some further thinking suggests that we can allow for multi-dimensional coordinates, but the only sane way to handle conflicting non-index coordinates is to drop them. Even Iris, with its strict interpretation of CF conventions, takes this approach. Raising an exception for conflicting non-scalar variables would make multi-dimensional coordinates impractical. |
Lays groundwork for pydata#197
Here's my current thinking on implementation:
Also, we need some new methods to make this workable (modeled off of pandas's
An important aspect is that using non-index coordinates (a power user feature) should be optional, just like how using and understanding |
Still a few items to check off (see the open associated issues) but I don't think they are blockers to v0.3. |
I am currently working with station data. In order to keep around latitude and longitude (I use station_id as the coordinate variable), I need to resort to some ridiculous contortions:
There has got to be an easier way to handle this.
I don't want to revert to some primitive guessing strategy (e.g, looking at
attrs['coordinates']
) to figure out which extra variables can be safely kept after mathematical operations.Another approach would be to try to preserve everything in the dataset linked to an DataArray when doing math. But I don't really like this option, either, because it would lead to serious propagation of "linked dataset variables", which are rather surprising and can have unexpected performance consequences (though at least they appear in repr as of #128).
This leaves me to a final alternative: restructuring xray's internals to provide first-class support for coordinates that are not indexes. For example, this would mean promoting
ds.coordinates
to an actual dictionary stored on a dataset, and allowing it to hold objects that aren't anxray.Coordinate
.Making this change transparent to users would likely require changing the
Dataset
signature to something likeDataset(variables, coords, attrs)
. We might (yet again) want to renameCoordinate
, to something likeIndexVar
, to emphasis the notion of "index" and "non-index" coordinates. And we could get rid of the terrible "linked dataset variable".Once we have non-index coordinates, we need a policy for what to do when adding with two DataArrays for which they differ. I think my preferred approach is to not enforce that they be found on both arrays, but to raise an exception if there are any conflicting values -- unless they are scalar valued, in which case the dropped or turned into a tuple or given different names. (Otherwise there would be cases where you couldn't calculate
x[1] - x[0]
.)We might even able to keep around multi-dimension coordinates this way (e.g., 2D lat/lon arrays for projected data).... I'll need to think about that one some more.
The text was updated successfully, but these errors were encountered: