Skip to content

Modular encodings (rebased) #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 11, 2014
Merged

Modular encodings (rebased) #245

merged 5 commits into from
Oct 11, 2014

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Oct 2, 2014

This change is rebased on master and should let us pick up from #175. CC @akleeman


Restructured Backends to make CF convention application more consistent.

Amongst other things this includes:

  • EncodedDataStores which can wrap other stores and allow
    for modular encoding/decoding.
  • Trivial indices ds['x'] = ('x', np.arange(10)) are no longer
    stored on disk and are only created when accessed.
  • AbstractDataStore API change. Shouldn't effect external users.
  • missing_value attributes now function like _FillValue

All current tests are passing (though it could use more new ones).


Post rebase notes (shoyer, Oct 2, 2014):
Most tests are passing, though a couple are broken:

  • test_roundtrip_mask_and_scale (because this change needs a fix to not
    break the current API)
  • test_roundtrip_strings_with_fill_value on TestCFEncodedDataStore
    (I don't entirely understand why, let's come back to it later)

consistent.

Amongst other things this includes:
- EncodedDataStores which can wrap other stores and allow
  for modular encoding/decoding.
- Trivial indices ds['x'] = ('x', np.arange(10)) are no longer
  stored on disk and are only created when accessed.
- AbstractDataStore API change.  Shouldn't effect external users.
- missing_value attributes now function like _FillValue

All current tests are passing (though it could use more new ones).

Post rebase notes (shoyer, Oct 2, 2014):
Most tests are passing, though a couple are broken:
- test_roundtrip_mask_and_scale (because this change needs a fix to not
  break the current API)
- test_roundtrip_strings_with_fill_value on TestCFEncodedDataStore
  (I don't entirely understand why, let's come back to it later)
@shoyer
Copy link
Member Author

shoyer commented Oct 5, 2014

closing this for now until the discussion in #175 is resolved.

@shoyer shoyer closed this Oct 5, 2014
a model where encoding/decoding happens when a dataset is
stored to/ loaded from a DataStore.

Conventions can now be enforced at the DataStore level by
overwritting the Datastore.store() and Datastore.load() methods, or
as an optional arg to Dataset.load_store, Dataset.dump_to_store.

Includes miscelanous cleanup.
@akleeman akleeman reopened this Oct 8, 2014
'supported on Python 2.6')
try:
store = backends.ScipyDataStore(gzip.open(nc), *args, **kwargs)
except TypeError, e:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the more modern syntax is except TypeError as e

@shoyer
Copy link
Member Author

shoyer commented Oct 8, 2014

Is there a reason you switched to using accessors (e.g., get_variables) instead of properties? Properties feel a little more pythonic to me. I like the idea of making property classes (e.g., DataStoreVariables) that abstract away the set_variable nonsense behind a dictionary like interface. I can submit a patch to add that, if you like.

@shoyer shoyer mentioned this pull request Oct 8, 2014
@akleeman
Copy link
Contributor

akleeman commented Oct 9, 2014

re: using accessors such as get_variables.

To be fair, this pull request didn't really switch the behavior, it was already very similar but with different names (store_variables, open_store_variable) that were now changed to get_variables etc. I chose to continue with getters and setters because it makes it fairly clear what needs to be implemented to make a new DataStore, and allows for some post processing such as _decode_variable_name and encoding/decoding. Its not entirely clear to me how that would fit into a properties based approach so sounds like a good follow up pull request to me.

@shoyer
Copy link
Member Author

shoyer commented Oct 9, 2014

K, fine by me.

On Wed, Oct 8, 2014 at 6:37 PM, akleeman [email protected] wrote:

re: using accessors such as get_variables.

To be fair, this pull request didn't really switch the behavior, it was already very similar but with different names (store_variables, open_store_variable) that were now changed to get_variables etc. I chose to continue with getters and setters because it makes it fairly clear what needs to be implemented to make a new DataStore, and allows for some post processing such as _decode_variable_name and encoding/decoding. Its not entirely clear to me how that would fit into a properties based approach so sounds like a good follow up pull request to me.

Reply to this email directly or view it on GitHub:
#245 (comment)

@shoyer shoyer added this to the 0.3.1 milestone Oct 9, 2014
self._encoder_args = args
self._encoder_kwdargs = kwdargs

def store(self, variables, attributes):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this method to a mixin class (something like AlwaysWriteCFEncoded) and use multiple inheritance to add it to NetCDF4DataStore and ScipyDataStore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that would get a little awkward since the encoding is now embedded within a store, so arguments to the encoder are passed into the DataStore constructor. As a result the mixin class would need to implement a constructor which stored arguments which are intended for the encoder, and the DataStore that extends the mixin would need to distinguish between arguments intended for the store/mixin. Doing this in any sort of automated way leads to the nasty bit of logic (encoding_decorator) that I just removed.

One alteration which might make that all less nasty is to have any encoding/decoding arguments be directly passed into DataStore store/load. Though, in the interest of not further bloating this PR I'd like to save that future change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do like the idea of passing encoding/decoding arguments directly into store/load.

On second thought, I do agree with you -- this is not enough redundant code to worry about factoring out.

if variables is None:
self._variables = OrderedDict()
else:
self._variables = variables
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also put these on one line each:

self._variables = OrderedDict() if variables is None else variables
self._attributes = OrderedDict() if attributes is None else attributes

@shoyer
Copy link
Member Author

shoyer commented Oct 9, 2014

should be good to go once you get all the tests passing.

@shoyer shoyer force-pushed the modular-encoding-v2 branch from b33b747 to ad7df2b Compare October 11, 2014 05:37
@shoyer
Copy link
Member Author

shoyer commented Oct 11, 2014

@akleeman I went ahead and pushed fixes for the failing tests. Feel free to merge if when you're ready.

shoyer added a commit that referenced this pull request Oct 11, 2014
@shoyer shoyer merged commit 8839723 into master Oct 11, 2014
@shoyer shoyer deleted the modular-encoding-v2 branch October 11, 2014 21:30
@shoyer shoyer restored the modular-encoding-v2 branch October 11, 2014 21:30
@shoyer
Copy link
Member Author

shoyer commented Oct 11, 2014

Merged, so I can do some more encodings work on top of this.

@shoyer shoyer mentioned this pull request Oct 13, 2014
@shoyer shoyer deleted the modular-encoding-v2 branch October 23, 2014 06:27
keewis pushed a commit to keewis/xarray that referenced this pull request Jan 17, 2024
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3.1.1 to 3.1.4.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v3.1.1...v3.1.4)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tom Nicholas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants