Add support for NETCDF4_CLASSIC to h5netcdf engine #10686

huard · 2025-09-03T05:02:22Z

Added logic in the h5netcdf engine to write pseudo NETCDF4_CLASSIC files, reusing encoding logic used by the netcdf4` engine.

The files generated with the PR using the latest h5netcdf release (1.6.4) won't be recognized by third party software as genuine NETCDF4_CLASSIC files, in part because they have no _nc3_strict hidden global attribute. There are other differences with netCDF4 generated files, including string attributes padding, how _FillValue is stored, etc. Changes to h5netcdf will be necessary to make netCDF files fully compliant with the CLASSIC format.

[x ] Closes Support "NETCDF4_CLASSIC" format with engine h5netcdf #10676
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

…th h5netcdf engine

…ys with NETCDF4_CLASSIC

…cdf4 and h5netcdf

…able attributes as well.

…tribute.

shoyer · 2025-09-03T18:17:33Z

Xarray currently doesn't have any logic to build these metadata attributes. Currently this is all handled in h5netcdf.

We should also make sure that trying to use NetCDF4-only features (e.g., groups) results in an error.

huard · 2025-09-03T18:48:40Z

The last commit uses h5dump to display differences between the expected and actual content of the HDF5 file. I was also able to add a _nc3_strict global attribute.

I can try to raise an error if groups are used.

The remaining differences are related to the SUPERBLOCK version , the STRPAD character, and the _FillValue. Not sure I'll be able to resolve those.

    SUPER_BLOCK {
  -    SUPERBLOCK_VERSION 0
  ?                       ^
  +    SUPERBLOCK_VERSION 2
  ?                       ^
...
         ATTRIBUTE "foo" {
             DATATYPE  H5T_STRING {
                STRSIZE 8;
  -             STRPAD H5T_STR_NULLPAD;
  ?                                ^^^
  +             STRPAD H5T_STR_NULLTERM;
  ?                                ^^^^
                CSET H5T_CSET_ASCII;
                CTYPE H5T_C_S1;
             }
...
          ATTRIBUTE "_FillValue" {
             DATATYPE  H5T_IEEE_F64LE
  -          DATASPACE  SCALAR
  +          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
             DATA {
             (0): nan
             }
          }

shoyer · 2025-09-03T20:04:08Z

If you really want to get metadata attributes and precise HDF5 types right, that should all be handled in h5netcdf. I think that's also the right place for h5dump tests.

In Xarray, all we should be doing for NETCDF4_CLASSIC is coercing some dtypes (using Xarray's encoders) to NetCDF3 compatible types.

kmuehlbauer · 2025-09-03T22:51:13Z

@huard Thanks for pushing this!

For the superblock issue, please add kwarg libver="earliest" when opening the file for writing. This will create the file with superblock version 0 for maximum backwards compatibility.

For the NULLPAD vs. NULLTERM there is some reading material here PyTables/PyTables#264 and here h5netcdf/h5netcdf#116. This one would need to be implemented in h5netcdf, if need be.

huard · 2025-09-04T15:25:27Z

@kmuehlbauer Thanks for the references, this is really helpful !

I'll remove the low-level stuff from this branch (_nc3_strict) and bring it into h5netcdf.

…tcdf).

…o `netCDF4_.get_datatype` skips required conversions. Remove global attribute from create_test_data because it impacts other tests in other files.

huard · 2025-09-09T02:50:51Z

This is ready for review.

While I added a test doing a roundtrip between netCDF4 and h5netcdf CLASSIC format, it does check how files are actually written inside the HDF5 file, just that they can be written and read consistently by xarray. Non-standard reading rules can hide non-standard writing rules. I'm planning to add "binary compatibility" tests within h5netcdf.

xarray/backends/h5netcdf_.py

shoyer · 2025-09-09T03:50:31Z

xarray/backends/h5netcdf_.py

+            if isinstance(value, bytes):
+                value = np.bytes_(value)


Why this special logic only for converting bytes? This seems unrelated to what we need for NETCDF4_CLASSIC.

To make sure strings are written as NC_CHAR, and not NC_STRING. See https://engee.com/helpcenter/stable/en/julia/NetCDF/strings.html

This is in fact the detail that our third party software in C++ choked on. The netCDF C library has both nc_get_att_text and nc_get_att_string functions. Calling nc_get_att_text on an NC_STRING raises an error.

shoyer · 2025-09-09T03:53:36Z

xarray/tests/test_backends.py

+    def test_string_attributes_stored_as_char(self, tmp_path):
+        import h5netcdf
+
+        original = Dataset(attrs={"foo": "bar"})
+        store_path = tmp_path / "tmp.nc"
+        original.to_netcdf(store_path, engine=self.engine, format=self.file_format)
+        with h5netcdf.File(store_path, "r") as ds:
+            # Check that the attribute is stored as a char array
+            assert ds._h5file.attrs["foo"].dtype == np.dtype("S3")


NumPy's S dtype actually corresponds to bytes, not str. I don't think we want to use it for storing attributes in general.

Using fixed width chars replicates the behavior of the netCDF4 backend for the CLASSIC format. Again, this has to do with the NC_CHAR vs NC_STRING formats.

Sticking as close as possible to netCDF4 output increases my confidence that the h5netcdf outputs will be compatible with 3rd party software expecting the CLASSIC format.

shoyer · 2025-09-09T03:54:50Z

xarray/tests/test_backends.py

+    def test_group_fails(self):
+        # Check writing group data fails with CLASSIC format
+        original = create_test_data()
+        with pytest.raises(ValueError):


Please test the full error message (or a significant fraction) using the match argument.

shoyer · 2025-09-09T03:56:49Z

xarray/tests/test_backends.py

@@ -4759,7 +4806,7 @@ def test_encoding_unlimited_dims(self) -> None:
    @requires_scipy
    def test_roundtrip_via_bytes(self) -> None:
        original = create_test_data()
-        netcdf_bytes = original.to_netcdf()
+        netcdf_bytes = original.to_netcdf(engine="scipy")


This test is indeed mistakenly using h5netcdf, not scipy (which will be fixed by #10624), but the fact that it appears you needed to change it to get tests passing is concerning here

I've removed it.

The error was due to an earlier version of the PR badly managing the logic for the format=None case. I fixed it but kept the bug fix in not knowing there was an upcoming fix elsewhere.

shoyer · 2025-09-09T03:58:09Z

xarray/backends/h5netcdf_.py

+        if format is not None and Version(h5netcdf.__version__) > Version("1.6.4"):
+            kwargs["format"] = format


Has this been in fact added to h5netcdf yet?

We should figure out the API h5netcdf will accept first before using it in xarray.

No change has been done in h5netcdf.

This PR assumes that h5netcdf knows nothing of the format argument for now, but that the next version will.

The plan is then to go inside h5netcdf and make a PR to support the CLASSIC format without breaking xarray tests.

But indeed, I'm assuming h5netcdf will accept a PR adding the format argument to their API, and that it will be merged before the next release. This might be over-optimistic and I'm happy to follow suggestions here.

I really think we should finish the API on the h5netcdf side first. That eliminates the non-zero risk that h5netcdf releases a new version before your PR to h5netcdf lands, or that h5netcdf settles on a different API for this.

@huard I agree with @shoyer and I'm gladly supporting a PR over at h5netcdf.

We should move fast, as a major version change is lurking around the corner. The integration of pyfive is almost ready. We could add in the NETCDF4_CLASSIC changes before or on top of the pyfive changes and release this together as h5netcdf v2.0.0. That way we could also prevent possible deprecation cycle.

Sounds good, see h5netcdf/h5netcdf#283

shoyer · 2025-09-09T03:58:32Z

xarray/backends/h5netcdf_.py

+        if format == "NETCDF4_CLASSIC" and group is not None:
+            raise ValueError("Cannot create sub-groups in `NETCDF4_CLASSIC` format.")


Does h5netcdf give a suitable error message here already?

h5netcdf.File does not even have a format argument, so no.

shoyer · 2025-09-09T04:01:41Z

xarray/backends/h5netcdf_.py

@@ -392,7 +421,7 @@ def prepare_variable(
            nc4_var = self.ds[name]

        for k, v in attrs.items():
-            nc4_var.attrs[k] = v
+            nc4_var.attrs[k] = self.convert_string(v)


I think this is doing the variable attribute conversion twice?

This here is for variable attributes only. The set_attribute above only operates on global attributes.

shoyer · 2025-09-09T04:02:26Z

xarray/backends/h5netcdf_.py

+
        return _encode_nc4_variable(variable, name=name)


Please put this inside an else clause so the logic is clearer

Done, some linters complain about the extra else, but agree this is clearer.

shoyer · 2025-09-09T04:02:40Z

xarray/backends/h5netcdf_.py

    def set_attribute(self, key, value):
+        value = self.convert_string(value)


I'm not sure you need a helper function here

The convert_string function is called from two different places (variable attributes and global attributes), hence the helper function.

Co-authored-by: Stephan Hoyer <[email protected]>

huard

Thanks for the review. Made the suggested changes, but I'm afraid the string attributes need to be saved as fixed width char arrays to be compliant with the CLASSIC file format.

huard · 2025-09-09T14:19:54Z

xarray/backends/h5netcdf_.py

+        if format == "NETCDF4_CLASSIC" and group is not None:
+            raise ValueError("Cannot create sub-groups in `NETCDF4_CLASSIC` format.")


h5netcdf.File does not even have a format argument, so no.

huard · 2025-09-09T14:22:49Z

xarray/backends/h5netcdf_.py

+        if format is not None and Version(h5netcdf.__version__) > Version("1.6.4"):
+            kwargs["format"] = format


No change has been done in h5netcdf.

This PR assumes that h5netcdf knows nothing of the format argument for now, but that the next version will.

The plan is then to go inside h5netcdf and make a PR to support the CLASSIC format without breaking xarray tests.

huard · 2025-09-09T14:36:42Z

xarray/backends/h5netcdf_.py

+            if isinstance(value, bytes):
+                value = np.bytes_(value)


To make sure strings are written as NC_CHAR, and not NC_STRING. See https://engee.com/helpcenter/stable/en/julia/NetCDF/strings.html

This is in fact the detail that our third party software in C++ choked on. The netCDF C library has both nc_get_att_text and nc_get_att_string functions. Calling nc_get_att_text on an NC_STRING raises an error.

huard · 2025-09-09T14:45:32Z

xarray/backends/h5netcdf_.py

    def set_attribute(self, key, value):
+        value = self.convert_string(value)


The convert_string function is called from two different places (variable attributes and global attributes), hence the helper function.

huard · 2025-09-09T14:46:35Z

xarray/backends/h5netcdf_.py

+
        return _encode_nc4_variable(variable, name=name)


Done, some linters complain about the extra else, but agree this is clearer.

huard · 2025-09-09T15:03:37Z

xarray/backends/h5netcdf_.py

@@ -392,7 +421,7 @@ def prepare_variable(
            nc4_var = self.ds[name]

        for k, v in attrs.items():
-            nc4_var.attrs[k] = v
+            nc4_var.attrs[k] = self.convert_string(v)


This here is for variable attributes only. The set_attribute above only operates on global attributes.

huard · 2025-09-09T15:14:33Z

xarray/tests/test_backends.py

+    def test_string_attributes_stored_as_char(self, tmp_path):
+        import h5netcdf
+
+        original = Dataset(attrs={"foo": "bar"})
+        store_path = tmp_path / "tmp.nc"
+        original.to_netcdf(store_path, engine=self.engine, format=self.file_format)
+        with h5netcdf.File(store_path, "r") as ds:
+            # Check that the attribute is stored as a char array
+            assert ds._h5file.attrs["foo"].dtype == np.dtype("S3")


Using fixed width chars replicates the behavior of the netCDF4 backend for the CLASSIC format. Again, this has to do with the NC_CHAR vs NC_STRING formats.

Sticking as close as possible to netCDF4 output increases my confidence that the h5netcdf outputs will be compatible with 3rd party software expecting the CLASSIC format.

huard · 2025-09-09T15:15:42Z

xarray/tests/test_backends.py

+    def test_group_fails(self):
+        # Check writing group data fails with CLASSIC format
+        original = create_test_data()
+        with pytest.raises(ValueError):


huard · 2025-09-09T15:20:06Z

xarray/tests/test_backends.py

@@ -4759,7 +4806,7 @@ def test_encoding_unlimited_dims(self) -> None:
    @requires_scipy
    def test_roundtrip_via_bytes(self) -> None:
        original = create_test_data()
-        netcdf_bytes = original.to_netcdf()
+        netcdf_bytes = original.to_netcdf(engine="scipy")


I've removed it.

The error was due to an earlier version of the PR badly managing the logic for the format=None case. I fixed it but kept the bug fix in not knowing there was an upcoming fix elsewhere.

huard added 6 commits September 2, 2025 14:03

Support NETCDF4_CLASSIC in the h5engine backend

68d5c73

convert bytes attributes to numpy.bytes_ in NETCDF4_CLASSIC format wi…

120af07

…th h5netcdf engine

added test to confirm string attributes are stored as numpy char arra…

f8f44f0

…ys with NETCDF4_CLASSIC

Added change to whats-new

522d37d

run pre-commit

783c407

Added test comparing CDL representation of test data written with net…

2f1c781

…cdf4 and h5netcdf

github-actions bot added topic-backends io labels Sep 3, 2025

huard added 3 commits September 3, 2025 01:02

Added global attribute to test data. Apply CLASSIC conversion to vari…

b142d38

…able attributes as well.

Merge branch 'main' into fix_10676

b14c373

Use h5dump to compare file content instead of CDL. Add _nc3_strict at…

909a96c

…tribute.

Merge branch 'fix_10676' of github.com:Ouranosinc/xarray into fix_10676

14d22a1

huard mentioned this pull request Sep 3, 2025

Option to write netcdf in "classic" mode h5netcdf/h5netcdf#280

Open

raise error if writing groups to CLASSIC file.

35b50ce

huard added 5 commits September 5, 2025 13:14

remove h5dump test. Remove _nc3_strict attribute (should go into h5ne…

a187c8d

…tcdf).

fix h5netcdf version check.

0c1bc08

Merge branch 'main' into fix_10676

10707ee

try to fix tests

03ea2de

Set default format to NETCDF4 instead of None, because passing None t…

592b98b

…o `netCDF4_.get_datatype` skips required conversions. Remove global attribute from create_test_data because it impacts other tests in other files.

shoyer reviewed Sep 9, 2025

View reviewed changes

huard and others added 3 commits September 9, 2025 11:21

Apply suggestions from code review

13b60d0

Co-authored-by: Stephan Hoyer <[email protected]>

Suggestions from review.

cf8b4be

Merge branch 'fix_10676' of github.com:Ouranosinc/xarray into fix_10676

d0b0948

huard commented Sep 9, 2025

View reviewed changes

huard mentioned this pull request Sep 9, 2025

Add partial support for NETCDF4_CLASSIC format h5netcdf/h5netcdf#283

Open

3 tasks

		if format is not None and Version(h5netcdf.__version__) > Version("1.6.4"):
		kwargs["format"] = format

		if format == "NETCDF4_CLASSIC" and group is not None:
		raise ValueError("Cannot create sub-groups in `NETCDF4_CLASSIC` format.")

		def set_attribute(self, key, value):
		value = self.convert_string(value)

Uh oh!

Add support for NETCDF4_CLASSIC to h5netcdf engine #10686

Are you sure you want to change the base?

Add support for NETCDF4_CLASSIC to h5netcdf engine #10686

Uh oh!

Conversation

huard commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Sep 3, 2025

Uh oh!

huard commented Sep 3, 2025

Uh oh!

shoyer commented Sep 3, 2025

Uh oh!

kmuehlbauer commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huard commented Sep 4, 2025

Uh oh!

huard commented Sep 9, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huard commented Sep 3, 2025 •

edited

Loading

kmuehlbauer commented Sep 3, 2025 •

edited

Loading