Skip to content

Restructure DataArray internals to not use a Dataset? #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue May 6, 2014 · 1 comment
Closed

Restructure DataArray internals to not use a Dataset? #117

shoyer opened this issue May 6, 2014 · 1 comment

Comments

@shoyer
Copy link
Member

shoyer commented May 6, 2014

It would be nice to allow DataArray objects without named dimensions (#116). But it doesn't make much sense to put arrays without named dimensions into a Dataset.

This suggests that we should change the current model for the internals of DataArray, which currently works by applying operations to an internal Dataset, and keeping track of the name of the name of the array of interest.

An alternate representation would use a fixed size list-like attribute coordinates to keep track of coordinates. Putting a DataArray without named dimensions into a Dataset will raise an error.

Positives:

  1. This is a more transparent and obvious model for directly working with DataArray objects.
  2. It will simplify making DataArrays without named dimensions.
  3. It will make choices like when to drop other dataset variables in an data array operation more obvious: other variables will always be dropped, because we won't bother keeping track of a dataset anymore.
  4. Related to my bullet 1, this will have positive performance implications for array indexing, since it will more obvious exactly which arrays you are indexing (currently indexing indexes every array in a dataset).

Negatives:

  1. This will certainly add lines of code and complexity. Making an operation work for both Datasets and DataArrays will no longer be quite so simple.
  2. It will no longer be as straightforward to access other related variables in a DataArray. In particular, it won't work to do ds['foo'].groupby('bar') if "bar" is not a dimension in ds['foo'], unless we keep around some sort of reference to the dataset in the array. Perhaps this tradeoff is worth it: ds['foo'].groupby(ds['bar']) isn't so terrible.

CC @mrocklin, I mentioned this up briefly in the context of #116 during PyData.

@shoyer
Copy link
Member Author

shoyer commented Sep 2, 2014

Closing this as "won't fix". As of #197/#221, DataArray internals have been restructured to (a) not expose the underlying dataset publicly and (b) use the notion of "coordinates" (vs. variables) which clarifies things tremendously.

@shoyer shoyer closed this as completed Sep 2, 2014
keewis pushed a commit to keewis/xarray that referenced this issue Jan 17, 2024
updates:
- [github.com/pre-commit/pre-commit-hooks: v4.2.0 → v4.3.0](pre-commit/pre-commit-hooks@v4.2.0...v4.3.0)
- [github.com/psf/black: 22.3.0 → 22.6.0](psf/black@22.3.0...22.6.0)
- [github.com/pre-commit/mirrors-mypy: v0.960 → v0.961](pre-commit/mirrors-mypy@v0.960...v0.961)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant