Hypothesis strategies in xarray.testing.strategies #6908

TomNicholas · 2022-08-11T15:20:56Z

Adds a whole suite of hypothesis strategies for generating xarray objects, inspired by and separated out from the new hypothesis strategies in #4972. They are placed into the namespace xarray.testing.strategies, and publicly mentioned in the API docs, but with a big warning message. There is also a new testing page in the user guide documenting how to use these strategies.

Closes Public hypothesis strategies for generating xarray data #6911
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

EDIT: A variables strategy and user-facing documentation were shipped in #8404

TomNicholas · 2022-08-11T15:33:52Z

I also added my chunking strategy from HypothesisWorks/hypothesis#3433

…ctor

for more information, see https://pre-commit.ci

…atible

xarray/testing/strategies.py

keewis · 2022-09-20T12:38:09Z

doc/user-guide/testing.rst

+but building a dataset from scratch (i.e. method (2)) requires building the dataset object in such as way that all of
+the data variables have compatible dimensions. You can build up a dictionary of the form ``{var_name: data_variable}``
+yourself, or you can use the ``data_vars`` argument to the ``data_variables`` strategy (TODO):
+
+.. ipython:: python
+    :okexcept:
+
+    sparse_data_vars = xrst.data_variables(data=sparse_arrays())
+    sparse_datasets = xrst.datasets(data_vars=sparse_data_vars)
+
+    sparse_datasets.example()


I had intended to push .pin in some form upstream, but I of course forgot about the other types of strategies so I can see why that would not be desirable.

Putting the code into the definition of the composite strategy is much better than what I had before (constructing the examples using data.draw directly in the test), so that would be fine with me.

Do you know if it is possible to use make_strategies_namespace with additional parameters to the array's constructor, like units for pint or chunks for dask? I guess if we use the pint_arrays function from above we could use partial for this (and anyway, pint does not implement __array_namespace__ at the moment).

for more information, see https://pre-commit.ci

dcherian · 2024-04-01T02:26:00Z

How do we move this forward? Even Xarray objects with just numpy arrays would be quite useful

Zac-HD · 2024-04-01T04:29:44Z

I think #8404 made a lot of progress on this, including shipping the user-facing documentation. If you wanted to open a PR rebasing this set of changes on main, I think that might be most of the remaining work.

for more information, see https://pre-commit.ci

TomNicholas · 2024-04-01T14:58:13Z

So I just did a monster merge of main into this branch (probably should still rebase). It won't work yet because we still need to propagate all the array_strategy_fn stuff that went through with #8404 into the signatures of the new strategies in this PR.

How do we move this forward?

It's mostly just dealing with the above and also making sure we can generate sets of variables with alignable dimensions efficiently. We also probably should think about what we want the signatures of the more complicated strategies to be: e.g. are we wanting to pass variables to datasets? or array_strategy_fn to datasets?

Even Xarray objects with just numpy arrays would be quite useful

A lot of the work that went into #8404 was working out how to make it general enough to handle non-numpy arrays.

doc/whats-new.rst

xarray/testing/strategies.py

* main: (214 commits) Adds copy parameter to __array__ for numpy 2.0 (pydata#9393) `numpy 2` compatibility in the `pydap` backend (pydata#9391) pyarrow dependency added to doc environment (pydata#9394) Extend padding functionalities (pydata#9353) refactor GroupBy internals (pydata#9389) Combine `UnsignedIntegerCoder` and `CFMaskCoder` (pydata#9274) passing missing parameters to ZarrStore.open_store when opening a datatree (pydata#9377) Fix tests on big-endian systems (pydata#9380) Improve error message on `ds['x', 'y']` (pydata#9375) Improve error message for missing coordinate index (pydata#9370) Add flaky to TestNetCDF4ViaDaskData (pydata#9373) Make chunk manager an option in `set_options` (pydata#9362) Revise (pydata#9371) Remove duplicate word from docs (pydata#9367) Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint (pydata#9243) Revise (pydata#9366) Fix rechunking to a frequency with empty bins. (pydata#9364) whats-new entry for dropping python 3.9 (pydata#9359) drop support for `python=3.9` (pydata#8937) Revise (pydata#9357) ...

Co-authored-by: Justus Magin <[email protected]>

dcherian · 2024-08-22T14:05:16Z

We also probably should think about what we want the signatures of the more complicated strategies to be: e.g. are we wanting to pass variables to datasets? or array_strategy_fn to datasets?

These seem like O(ε) improvements to a really great PR.

maxrjones · 2025-03-17T18:00:03Z

@TomNicholas I think these strategies would be really helpful for zarr-developers/VirtualiZarr#394 and zarr-developers/VirtualiZarr#490. Is there anything I can do to help move this forward?

TomNicholas · 2025-03-17T18:26:43Z

Oh man I left this one for so long... 😞

I think the reason I didn't merge it is because currently the API for the variables, dataarrays and datasets strategies all differ in the way they want you to create the actual wrapped duckarrays. They should all either accept array_strategy_fn or be able to have that arg passed down to them.

It may be that if we merge it now we have to go back and change those APIs to be consistent later. But it also maybe that that isn't too bad, and it basically just limits the usefulness of this PR for now.

TomNicholas added 7 commits August 11, 2022 03:11

copied files defining strategies over to this branch

587ebb8

placed testing functions in their own directory

acbfa69

moved hypothesis strategies into new testing directory

73d763f

begin type hinting strategies

db2deff

renamed strategies for consistency with hypothesis conventions

746cfc8

added strategies to public API (with experimental warning)

03cd9de

strategies for chunking patterns

2fe3583

TomNicholas mentioned this pull request Aug 11, 2022

Automatic duck array testing - reductions #4972

Draft

4 tasks

This was referenced Aug 11, 2022

Strategy for chunking arrays HypothesisWorks/hypothesis#3433

Closed

Public hypothesis strategies for generating xarray data #6911

Open

TomNicholas added the topic-hypothesis Strategies or tests using the hypothesis library label Aug 12, 2022

TomNicholas added 4 commits August 12, 2022 21:31

rewrote variables strategy to have same signature as Variable constru…

4db3629

…ctor

test variables strategy

14d11aa

fixed most tests

418a359

added helpers so far to API docs

c8a7d0e

github-actions bot added topic-testing documentation and removed topic-hypothesis Strategies or tests using the hypothesis library labels Aug 13, 2022

TomNicholas added 2 commits August 13, 2022 00:11

add hypothesis to docs CI env

d48aceb

add todo about attrs

a20e341

github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Aug 13, 2022

TomNicholas and others added 7 commits August 13, 2022 12:24

draft of new user guide page on testing

3a4816f

types for dataarrays strategy

d0406a2

draft for chained chunking example

65a222d

[pre-commit.ci] auto fixes from pre-commit.com hooks

e1d718a

for more information, see https://pre-commit.ci

only accept strategy objects

57d0f5b

fixed failure with passing in two custom strategies that must be comp…

82c734c

…atible

syntax error in example

029f19a

keewis reviewed Sep 20, 2022

View reviewed changes

This was referenced Sep 20, 2022

Array objects of arbitrary rank are infeasible - require a reasonable range of ranks instead data-apis/array-api#479

Closed

Warn on bool(st.booleans()) HypothesisWorks/hypothesis#3463

Closed

TomNicholas mentioned this pull request Oct 20, 2022

combine_by_coords allows one overlapping coordinate value, but not more than one #7189

Open

4 tasks

TomNicholas mentioned this pull request Mar 8, 2023

Generalize handling of chunked array types #7019

Merged

15 tasks

TomNicholas added 2 commits July 19, 2023 22:45

Merge branch 'main' into hypothesis-strategies

4dcbc60

fix some api links in docs

7841dd5

TomNicholas mentioned this pull request Jul 24, 2023

Test suite cubed-dev/cubed-xarray#6

Closed

TomNicholas and others added 2 commits November 2, 2023 07:31

Merge branch 'main' into hypothesis-strategies

968ee72

[pre-commit.ci] auto fixes from pre-commit.com hooks

a6fc063

for more information, see https://pre-commit.ci

TomNicholas mentioned this pull request Nov 2, 2023

Hypothesis strategy for generating Variable objects #8404

Merged

4 tasks

TomNicholas and others added 3 commits April 1, 2024 10:46

Merge branch 'main' into hypothesis-strategies

6a4a403

remove np_arrays strategy

0b13771

[pre-commit.ci] auto fixes from pre-commit.com hooks

b44a4a2

for more information, see https://pre-commit.ci

dcherian reviewed Apr 1, 2024

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

dcherian reviewed Apr 1, 2024

View reviewed changes

xarray/testing/strategies.py Outdated Show resolved Hide resolved

TomNicholas added 2 commits April 1, 2024 12:00

fix bad merge of whatsnew

cdcfbf4

fix bad merge in strategies

0aab116

Zac-HD mentioned this pull request Apr 4, 2024

Create hypothesis.extra.xarray HypothesisWorks/hypothesis#3948

Closed

dcherian and others added 4 commits August 22, 2024 07:55

Update xarray/testing/strategies.py

525a4b6

Co-authored-by: Justus Magin <[email protected]>

one more

b343f4f

No implicit Optional

e6d8e64

TomNicholas mentioned this pull request Aug 22, 2024

Design of an inherited xarray duck array test suite xarray-contrib/xarray-array-testing#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypothesis strategies in xarray.testing.strategies #6908

Hypothesis strategies in xarray.testing.strategies #6908

TomNicholas commented Aug 11, 2022 •

edited

Loading

TomNicholas commented Aug 11, 2022

keewis Sep 20, 2022

dcherian commented Apr 1, 2024

Zac-HD commented Apr 1, 2024

TomNicholas commented Apr 1, 2024

dcherian commented Aug 22, 2024

maxrjones commented Mar 17, 2025

TomNicholas commented Mar 17, 2025

Hypothesis strategies in xarray.testing.strategies #6908

Are you sure you want to change the base?

Hypothesis strategies in xarray.testing.strategies #6908

Conversation

TomNicholas commented Aug 11, 2022 • edited Loading

TomNicholas commented Aug 11, 2022

keewis Sep 20, 2022

Choose a reason for hiding this comment

dcherian commented Apr 1, 2024

Zac-HD commented Apr 1, 2024

TomNicholas commented Apr 1, 2024

dcherian commented Aug 22, 2024

maxrjones commented Mar 17, 2025

TomNicholas commented Mar 17, 2025

TomNicholas commented Aug 11, 2022 •

edited

Loading