Skip to content

Hypothesis strategies in xarray.testing.strategies #6908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 111 commits into
base: main
Choose a base branch
from

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Aug 11, 2022

Adds a whole suite of hypothesis strategies for generating xarray objects, inspired by and separated out from the new hypothesis strategies in #4972. They are placed into the namespace xarray.testing.strategies, and publicly mentioned in the API docs, but with a big warning message. There is also a new testing page in the user guide documenting how to use these strategies.

EDIT: A variables strategy and user-facing documentation were shipped in #8404

@TomNicholas
Copy link
Member Author

I also added my chunking strategy from HypothesisWorks/hypothesis#3433

@github-actions github-actions bot added topic-testing documentation and removed topic-hypothesis Strategies or tests using the hypothesis library labels Aug 13, 2022
@github-actions github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Aug 13, 2022
Comment on lines 262 to 272
but building a dataset from scratch (i.e. method (2)) requires building the dataset object in such as way that all of
the data variables have compatible dimensions. You can build up a dictionary of the form ``{var_name: data_variable}``
yourself, or you can use the ``data_vars`` argument to the ``data_variables`` strategy (TODO):

.. ipython:: python
:okexcept:

sparse_data_vars = xrst.data_variables(data=sparse_arrays())
sparse_datasets = xrst.datasets(data_vars=sparse_data_vars)

sparse_datasets.example()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had intended to push .pin in some form upstream, but I of course forgot about the other types of strategies so I can see why that would not be desirable.

Putting the code into the definition of the composite strategy is much better than what I had before (constructing the examples using data.draw directly in the test), so that would be fine with me.

Do you know if it is possible to use make_strategies_namespace with additional parameters to the array's constructor, like units for pint or chunks for dask? I guess if we use the pint_arrays function from above we could use partial for this (and anyway, pint does not implement __array_namespace__ at the moment).

@dcherian
Copy link
Contributor

dcherian commented Apr 1, 2024

How do we move this forward? Even Xarray objects with just numpy arrays would be quite useful

@Zac-HD
Copy link
Contributor

Zac-HD commented Apr 1, 2024

I think #8404 made a lot of progress on this, including shipping the user-facing documentation. If you wanted to open a PR rebasing this set of changes on main, I think that might be most of the remaining work.

@TomNicholas
Copy link
Member Author

So I just did a monster merge of main into this branch (probably should still rebase). It won't work yet because we still need to propagate all the array_strategy_fn stuff that went through with #8404 into the signatures of the new strategies in this PR.

How do we move this forward?

It's mostly just dealing with the above and also making sure we can generate sets of variables with alignable dimensions efficiently. We also probably should think about what we want the signatures of the more complicated strategies to be: e.g. are we wanting to pass variables to datasets? or array_strategy_fn to datasets?

Even Xarray objects with just numpy arrays would be quite useful

A lot of the work that went into #8404 was working out how to make it general enough to handle non-numpy arrays.

dcherian and others added 4 commits August 22, 2024 07:55
* main: (214 commits)
  Adds copy parameter to __array__ for numpy 2.0 (pydata#9393)
  `numpy 2` compatibility in the `pydap` backend (pydata#9391)
  pyarrow dependency added to doc environment (pydata#9394)
  Extend padding functionalities (pydata#9353)
  refactor GroupBy internals (pydata#9389)
  Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274)
  passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377)
  Fix tests on big-endian systems (pydata#9380)
  Improve error message on `ds['x', 'y']` (pydata#9375)
  Improve error message for missing coordinate index (pydata#9370)
  Add flaky to TestNetCDF4ViaDaskData (pydata#9373)
  Make chunk manager an option in `set_options` (pydata#9362)
  Revise (pydata#9371)
  Remove duplicate word from docs (pydata#9367)
  Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243)
  Revise (pydata#9366)
  Fix rechunking to a frequency with empty bins. (pydata#9364)
  whats-new entry for dropping python 3.9 (pydata#9359)
  drop support for `python=3.9` (pydata#8937)
  Revise (pydata#9357)
  ...
@dcherian
Copy link
Contributor

We also probably should think about what we want the signatures of the more complicated strategies to be: e.g. are we wanting to pass variables to datasets? or array_strategy_fn to datasets?

These seem like O(ε) improvements to a really great PR.

@maxrjones
Copy link
Contributor

@TomNicholas I think these strategies would be really helpful for zarr-developers/VirtualiZarr#394 and zarr-developers/VirtualiZarr#490. Is there anything I can do to help move this forward?

@TomNicholas
Copy link
Member Author

Oh man I left this one for so long... 😞

I think the reason I didn't merge it is because currently the API for the variables, dataarrays and datasets strategies all differ in the way they want you to create the actual wrapped duckarrays. They should all either accept array_strategy_fn or be able to have that arg passed down to them.

It may be that if we merge it now we have to go back and change those APIs to be consistent later. But it also maybe that that isn't too bad, and it basically just limits the usefulness of this PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file enhancement topic-testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Public hypothesis strategies for generating xarray data
5 participants