Skip to content

Switch tests to use HDF reader instead of kerchunk-based HDF5 reader #374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomNicholas opened this issue Jan 9, 2025 · 6 comments · Fixed by #395
Closed

Switch tests to use HDF reader instead of kerchunk-based HDF5 reader #374

TomNicholas opened this issue Jan 9, 2025 · 6 comments · Fixed by #395
Assignees
Labels
dependencies Updates a dependency HDF reader Non-kerchunk-based HDF reader backend Kerchunk Relating to the kerchunk library / specification itself testing

Comments

@TomNicholas
Copy link
Member

Zarr v3 was officially released today, but we still can't just explicitly depend on it because kerchunk doesn't work with zarr v3. There is a compatibility branch of kerchunk, which is sufficient to get kerchunk's readers working again, but development of that branch has stalled due to errors that arise in kerchunk's MultiZarrToZarr code.

We can't just depend directly on this compatibility branch because that's janky AF.

If we try to change our dependency to zarr v3 these things will break:

  • kerchunk-based readers (HDF5, netCDF3, and FITS)
  • half our integration tests, which round-trip using the HDF5 reader.

Breaking the kerchunk-based readers wouldn't be that bad - we have @sharkinsspatial 's HDF reader as an alternative for opening HDF/netCDF4 files, and the other two are much less complex/commonly used.

Breaking our integration tests isn't really acceptable, so what we should try is swapping all those tests to use Sean's HDF reader instead of the kerchunk-based HDF5 reader. If that works then we can move forward, and the only downside is that we would have to raise a warning in the HDF5, netCDF3, and FITS readers that you need to install the weird compatibility branch of kerchunk. That's better than the current situation, where you need the weird compatibility branch if you want to use zarr v3 (and therefore icechunk!) at all.

cc @norlandrhagen @mpiannucci

@TomNicholas TomNicholas added dependencies Updates a dependency HDF reader Non-kerchunk-based HDF reader backend Kerchunk Relating to the kerchunk library / specification itself testing labels Jan 9, 2025
@sharkinsspatial sharkinsspatial self-assigned this Jan 27, 2025
@jsignell
Copy link
Contributor

Most of the integration tests in https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/tests/test_integration.py are parameterized to use both Sean's HDF5 reader and the kerchunk reader.

Do we want to just remove the kerchunk reader from those tests or figure out a pattern for skipping the kerchunk reader run when kerchunk is unavailable?

@TomNicholas
Copy link
Member Author

Most of the integration tests in https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/tests/test_integration.py are parameterized to use both Sean's HDF5 reader and the kerchunk reader.

Yeah I realised that yesterday too.

Do we want to just remove the kerchunk reader from those tests

See #392 (comment) - the objective is to be able to run the tests without kerchunk even being installed. I think that means that for now we just remove the kerchunk reader from those tests.

@jsignell
Copy link
Contributor

the objective is to be able to run the tests without kerchunk even being installed

Right! But there could be tests that do run if kerchunk is installed and are just skipped otherwise.

Anyways I am going to take this one on now. I think it makes sense to do this one before #376

@TomNicholas
Copy link
Member Author

TomNicholas commented Jan 28, 2025

there could be tests

We want to be able to pin zarr-python>=3.0.0, and run all tests with it in a CI env that only has released versions of packages installed. Currently that completely precludes using Kerchunk at all, so there should not be any tests that even might use Kerchunk.

it makes sense to do this one before #376

I agree, thanks for taking this on.

@jsignell
Copy link
Contributor

there should not be any tests that even might use Kerchunk.

It felt weird to delete tests that exercise functionality. So see what you think of the approach.

@TomNicholas
Copy link
Member Author

It felt weird to delete tests that exercise functionality.

I guess it's fine for us to keep tests that will never be run by the CI (until kerchunk reaches v3 compatibility and can be re-instated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Updates a dependency HDF reader Non-kerchunk-based HDF reader backend Kerchunk Relating to the kerchunk library / specification itself testing
Projects
Development

Successfully merging a pull request may close this issue.

3 participants