Skip to content

Allow for consistenty ordered coords #4755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

max-sixty
Copy link
Collaborator

  • Tests added
  • Passes isort . && black . && mypy . && flake8
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

I took an initial stab at allowing for consistently ordered coords, which came up recently, by using a list rather than a set. It's fairly involved, and the currently proposed code fails quite a few tests.

Another option would be to use OrderedSet. Or to leave it as a set.

@max-sixty max-sixty changed the title Initial attempt at using list rather than set for coord_names Allow for consistenty ordered coords Jan 3, 2021
@dcherian
Copy link
Contributor

dcherian commented Jan 4, 2021

Another option would be to use OrderedSet.

Would this be a smaller change overall (aside from typing)?

@max-sixty
Copy link
Collaborator Author

Would this be a smaller change overall (aside from typing)?

I think slightly smaller, yes.

@max-sixty
Copy link
Collaborator Author

As discussed in dev meeting:

  • one case to look at is that of pulling a variable out of a dataset
  • consider using an OrderedSet, which will perform better in worst case

@dcherian
Copy link
Contributor

dcherian commented Jan 6, 2021

one case to look at is that of pulling a variable out of a dataset

@max-sixty see #4744

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to submit these suggestions before the meeting, so they might be out-of-date (but possibly still helpful).

def _names(self) -> Set[Hashable]:
return set(self._data._coords)
def _names(self) -> List[Hashable]:
return list(sorted(self._data._coords))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted always returns a list:

Suggested change
return list(sorted(self._data._coords))
return sorted(self._data._coords)

Comment on lines +293 to +294
new_coord_names = coord_names + [x for x in vars_to_replace if x not in coord_names]
new_coord_names = [x for x in new_coord_names if x not in vars_to_remove]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we can come up with something more efficient (or if being efficient is really necessary), but this seems a bit wasteful (we iterate over set(vars_to_replace) - set(coord_names) twice). Maybe something like this:

Suggested change
new_coord_names = coord_names + [x for x in vars_to_replace if x not in coord_names]
new_coord_names = [x for x in new_coord_names if x not in vars_to_remove]
new_coord_names = [x for x in coord_names if x not in vars_to_remove]
new_coord_names += [
x
for x in vars_to_replace
if x not in coord_names and x not in vars_to_remove
]

?

Comment on lines +351 to +352
new_coord_names = coord_names + [x for x in vars_to_create if x not in coord_names]
new_coord_names = [x for x in new_coord_names if x not in vars_to_remove]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here:

Suggested change
new_coord_names = coord_names + [x for x in vars_to_create if x not in coord_names]
new_coord_names = [x for x in new_coord_names if x not in vars_to_remove]
new_coord_names = [x for x in coord_names if x not in vars_to_remove]
new_coord_names += [
x
for x in vars_to_create
if x not in coord_names and x not in vars_to_remove
]

@@ -690,7 +691,8 @@ def __init__(
self._file_obj = None
self._encoding = None
self._variables = variables
self._coord_names = coord_names
# TODO: can we remove `sorted` and let it be user-defined?
self._coord_names = sorted(coord_names)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if user-defined is a good idea for Dataset, see #4649. sorted would have the advantage of always being predictable.

@max-sixty
Copy link
Collaborator Author

max-sixty commented Jun 2, 2021

Going to close this in favor of using an OrderedSet — while I won't do this now, I can in the future, or someone is welcome to take this on. I should have realized that would be a better design in the first place!

@max-sixty max-sixty closed this Jun 2, 2021
@max-sixty max-sixty deleted the ordered-coords branch June 28, 2021 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants