-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Treat accessor dataarrays as members of parent dataset #2517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you want to return your newly-calculated altitude and also have it be a full data_var in your dataset, one way would be to just alter the original dataset in-place. Something like import xarray as xr
import pandas as pd
import xarray.testing as xrt
@xr.register_dataset_accessor('acc')
class Accessor(object):
def __init__(self, xarray_ds):
self._ds = xarray_ds
self._altitude = None
@property
def altitude(self):
""" An expensive calculation that results in data that not everyone needs. """
if self._altitude is None:
self._altitude = xr.DataArray([5, 10, 15, 20, 25],
coords=[('time', self._ds.time)])
# Here we add the calculated altitude to the dataset as a new data variable
self._ds['altitude'] = self._altitude
# Return just the altitude dataarray
return self._altitude
expected = xr.Dataset({'data': (['time'], [100, 30, 10, 3, 1]),
'altitude': (['time'], [5, 10, 15, 20, 25])},
coords={'time': pd.date_range('2014-09-06', periods=5, freq='1s')})
actual = xr.Dataset({'data': (['time'], [100, 30, 10, 3, 1])},
coords={'time': pd.date_range('2014-09-06', periods=5, freq='1s')})
# Return newly-calculated altitude, but also store it in the actual dataset for later
altitude = actual.acc.altitude
# Check that worked
xrt.assert_equal(actual, expected)
xrt.assert_equal(actual['altitude'], actual.acc.altitude) |
The only problem I see with this is that |
That's true, but unless you start subclassing dataset then isn't that always going to be the case? You have some quantity which you can only calculate with either a function or an accessor method on the dataset, wouldn't you need to alter the |
I think the cleanest way to do this in the long term would be to combine some sort of "lazy array" object with caching, e.g., along the lines of what's described in #2298. I'm not sure what the best solution in the short-term is, though. |
Code Sample
Problem description
I've been using accessors to extend xarray with some custom computation. The altitude in the above dataset is not used every time the data is loaded, but when it is, it is an expensive computation to make (which is why I put it in as an accessor; if it isn't needed, it isn't computed).
Problem is, once it has been computed, I'd like to be able to use it as if it is a regular data_var of the dataset. For example, to interp on the newly computed column, or use it in a groupby.
Please advise if I'm going about this in the wrong way and how I should think about this problem instead.
Output of
xr.show_versions()
The text was updated successfully, but these errors were encountered: