Skip to content

Conversation

@mishaschwartz
Copy link
Contributor

  • Add ability to parse vertical data from CF metadata extracted from NCML files

This includes writing geometries and bounding boxes with vertical data and removes the restriction that bounding box values have to be sorted (since a longitude values are allowed to have a greater min than max if it crosses the antimeridian).

  • Convert longitude values in range 0-360 degrees to -180-180 degrees to comply with WGS84

If longitude values are originally given in the 0-360 range, those values will be used for the datacube dimensions still. However, STAC requires that bbox and geometry values follow WGS84 which requires values to be between -180 and 180.

Also removes the geometry_model attribute because it was unused and now geometries can be either a Polygon (if it doesn't cross the antimeridian) or a MultiPolygon (if it does).

@mishaschwartz mishaschwartz changed the title Spatial extents from cf update Spatial extents from cf data update Dec 1, 2025
@mishaschwartz mishaschwartz changed the title Spatial extents from cf data update Spatial extents now support vertical component and are WGS84 compliant Dec 1, 2025
Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ability to parse vertical data from CF metadata extracted from NCML files

Nice!

(since a longitude values are allowed to have a greater min than max if it crosses the antimeridian).

The standard and recommended way to handle this is to split the geometry: https://www.rfc-editor.org/rfc/rfc7946#section-3.1.9

It seems to be what you are trying to do with the multi-polygon, but then, shouldn't the min/max checks still remain valid? Is it just for the input validation, but the result is validated to respect these conditions?

Should it be a CLI option that auto-patches them (and leave it a strict check by default)? It seems risky to assume and transparently let actually incorrect values to be "fixed".

Other things to consider is that fixing the antimeridian depending on how often / which angles it crosses it is tricker. Quick fix could be to use this lib from the same dev working on STAC: https://github.com/gadomski/antimeridian
Maybe it should be considered?

  • Convert longitude values in range 0-360 degrees to -180-180 degrees to comply with WGS84

I find this to be risky. There are a lot of assumptions if the numbers are not already in range (they might not be a simple offset of 180deg.). If any other CRS than WGS84 is used, it MUST be indicated somewhere since it cannot be assumed, and the proj details (https://github.com/stac-extensions/projection#fields) (and/or cube:dimensions.*.reference_system) should most definitely reflect them as well since the linked data will still be in that reference system.

Comment on lines +176 to +177
elif key in ["Z", "vertical"]:
extent = geo_data["vertical_min"], geo_data["vertical_max"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the unit seems somewhat more important than other spatial coordinates since it does not have a default CRS to rely on (m, km, etc.?).

Other dimensions should also chain unit and step details if provided (eg.: hours instead of seconds, resolution between samples), as in your test file.

If the original coords were not (-180, 180), and are still in that range within the NetCDF file, I would expect a different CRS. In such case, the reference_system should be indicated as well.

@fmigneault
Copy link
Collaborator

Highlighting this fix just in case, since it works with the same concepts.
stac-utils/pystac#1596 has been merged to address the issue with vertical dimension.
Relates to #125

@mishaschwartz
Copy link
Contributor Author

The standard and recommended way to handle this is to split the geometry: https://www.rfc-editor.org/rfc/rfc7946#section-3.1.9

Yes but the recommended way to handle the antimeridian for a bounding box is different: https://www.rfc-editor.org/rfc/rfc7946#section-5.2

Should it be a CLI option that auto-patches them (and leave it a strict check by default)? It seems risky to assume and transparently let actually incorrect values to be "fixed".

It depends on how we want to handle values that are not compliant with the STAC spec. STAC wants the geometry and bounding boxes to be compliant with WGS84 but it doesn't say anything about other properties (like cube:dimensions). I have opted to require that the geometry and bbox be required to be compliant with STAC (and WGS84) and cube:dimensions can contain the "original" values.

https://github.com/gadomski/antimeridian Maybe it should be considered?

Yes I've looked into that library and it's a good one. It is not necessary in this case though because we are only ever considering rectangular geometries (because of the information available from cf-metadata). In the future, if we create a populator that gets input from more complex geometries we should look into that as a potential solution.

I find this to be risky. There are a lot of assumptions if the numbers are not already in range (they might not be a simple offset of 180deg.)

Agreed, but this doesn't make any other assumptions other than what was already made. Before, the code just assumed that the values were also in degrees east and degrees north. We could start getting into converting all the possible coordinate systems to degrees east and north as well but for this PR is it sufficient to warn if another coordinate system is used? I don't know if I have the expertise to try and convert coordinate systems right now.

and the proj details (https://github.com/stac-extensions/projection#fields) (and/or cube:dimensions.*.reference_system) should most definitely reflect them as well since the linked data will still be in that reference system.

This is one of the reasons why I left the original values for the cube: properties so that the information wouldn't be lost.

@fmigneault
Copy link
Collaborator

fmigneault commented Dec 2, 2025

I have opted to require that the geometry and bbox be required to be compliant with STAC (and WGS84) and cube:dimensions can contain the "original" values.

I think this is the right way as well, if cube:dimensions contains their reference_system. Otherwise, it Defaults to EPSG code 4326. (ie: using WGS84 datum), which makes interpretation of the values incorrect if left to original CRS. The geometry and bbox in WGS84 is to allow search across the collections without need to convert it.

However, I think there is a distinction to be made here. What the STAC result should be is definitely WGS84, but the inputs could be anything, which we convert. I think it was fair that values were assumed WGS84 and used as-is before since no conversion was applied (they were doing the same "default" as when CRS is omitted across the internet), but since a conversion is now performed, it should be stricter on the CRS it uses.

Conversion can be done with pyproj, with the to(WGS84)/from(???) CRSs and a Transformer.
Then, you simply pass the coordinates to the utility and it will handle all relevant conversions.

@mishaschwartz
Copy link
Contributor Author

I think this is the right way as well, if cube:dimensions contains their reference_system.

Yes I agree. The main question is how to handle CRS that are not specified in the metadata. CF uses the geospatial_bounds_crs and geospatial_bounds_vertical_crs variables to indicate which CRS is intended but if they're not specified do we try to guess based on other values (like the units?) or should we just default to assuming WGS84?

What the STAC result should be is definitely WGS84, but the inputs could be anything, which we convert.

Agreed, that's what we're trying to do here.

Conversion can be done with pyproj

Ok, I was going to leave more complex conversions for a later stage but I'm happy to do it all in this PR as well.

@fmigneault
Copy link
Collaborator

CF uses the geospatial_bounds_crs and geospatial_bounds_vertical_crs variables to indicate which CRS is intended but if they're not specified do we try to guess based on other values

My honest opinion would be to fix the reference NetCDF/NCML metadata to fix the problem at the source since using anything else than EPSG:4326 values in geospatial_bounds without specifying the geospatial_bounds_crs is by definition incorrect, and therefore not CF-compliant. Loading that NetCDF in various software could lead to incorrect/inconsistent results between implementations.

That being said, for "patching" (since metadata errors happen...), I wouldn't mind having some kind of input flag where the data source CRS is injected. However, it would have to be a purposely set flag (not guessed), and it should probably still consider any explicitly indicated CRS (in case only some of the metadata files are incorrectly annotated). If really forcing it, it could be a 2-option approach. Something like --fix-crs vs --force-crs with the later being more brutal/direct.

For geospatial_bounds_vertical_crs, it is more lax since 3D coordinates are not typically "default". It could be valid that it is missing in some cases, since it cannot be used if the geospatial_bounds_crs happens to be directly a 3D CRS (eg: EPSG:4979 for "EPSG:4326 3D"). Therefore, geospatial_bounds_vertical_crs could be missing if provided in geospatial_bounds_crs, but then, that field is again mandatory to indicate the Z component. We need to consider the various references combinations depending on the CRS codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants