You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to allow DataArray objects without named dimensions (#116). But it doesn't make much sense to put arrays without named dimensions into a Dataset.
This suggests that we should change the current model for the internals of DataArray, which currently works by applying operations to an internal Dataset, and keeping track of the name of the name of the array of interest.
An alternate representation would use a fixed size list-like attribute coordinates to keep track of coordinates. Putting a DataArray without named dimensions into a Dataset will raise an error.
Positives:
This is a more transparent and obvious model for directly working with DataArray objects.
It will simplify making DataArrays without named dimensions.
It will make choices like when to drop other dataset variables in an data array operation more obvious: other variables will always be dropped, because we won't bother keeping track of a dataset anymore.
Related to my bullet 1, this will have positive performance implications for array indexing, since it will more obvious exactly which arrays you are indexing (currently indexing indexes every array in a dataset).
Negatives:
This will certainly add lines of code and complexity. Making an operation work for both Datasets and DataArrays will no longer be quite so simple.
It will no longer be as straightforward to access other related variables in a DataArray. In particular, it won't work to do ds['foo'].groupby('bar') if "bar" is not a dimension in ds['foo'], unless we keep around some sort of reference to the dataset in the array. Perhaps this tradeoff is worth it: ds['foo'].groupby(ds['bar']) isn't so terrible.
CC @mrocklin, I mentioned this up briefly in the context of #116 during PyData.
The text was updated successfully, but these errors were encountered:
Closing this as "won't fix". As of #197/#221, DataArray internals have been restructured to (a) not expose the underlying dataset publicly and (b) use the notion of "coordinates" (vs. variables) which clarifies things tremendously.
It would be nice to allow DataArray objects without named dimensions (#116). But it doesn't make much sense to put arrays without named dimensions into a Dataset.
This suggests that we should change the current model for the internals of DataArray, which currently works by applying operations to an internal Dataset, and keeping track of the name of the name of the array of interest.
An alternate representation would use a fixed size list-like attribute
coordinates
to keep track of coordinates. Putting a DataArray without named dimensions into a Dataset will raise an error.Positives:
Negatives:
ds['foo'].groupby('bar')
if "bar" is not a dimension inds['foo']
, unless we keep around some sort of reference to the dataset in the array. Perhaps this tradeoff is worth it:ds['foo'].groupby(ds['bar'])
isn't so terrible.CC @mrocklin, I mentioned this up briefly in the context of #116 during PyData.
The text was updated successfully, but these errors were encountered: