Skip to content

Sentinel-2 EOPF GeoZarr #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

emmanuelmathot
Copy link

@emmanuelmathot emmanuelmathot commented Jun 18, 2025

Summary

Branching from initial PR from @wietzesuijker🙏🏼, this PR is a WIP for experimenting GeoZarr from EOPF Sentinel-2
For now, it simply adds Cloud Optimized GeoTIFF (COG) style multiscale functionality to the geozarr-examples repository, implementing proper overview levels that maintain native projections and follow industry best practices.

Changes

🆕 New Helper Module: src/geozarr_examples/cog_multiscales.py

A comprehensive utility module providing:

  • COG-style overview calculation with /2 downsampling logic
  • Native CRS preservation (no reprojection to Web Mercator)
  • Proper coordinate array generation for each overview level
  • CRS storage with multiple format support (WKT, EPSG, PROJ4)
  • Numpy-based downsampling (no external dependencies)
  • Verification and visualization functions

🔄 Updated Notebook: docs/examples/06_multiscales_as_WebMercatorQuad_EOPFZarrV3.ipynb

  • Simplified and maintainable by importing helper functions
  • Focuses on workflow rather than implementation details
  • Some documentation with clear examples
  • Demonstrates xarray.plot() working with proper coordinates

Key Features

Follows COG Conventions

  • Overviews maintain native projection (UTM in example)
  • Uses /2 downsampling logic (1:1, 1:2, 1:4, 1:8, etc.)
  • Stops when dimensions < 256 pixels
  • Compatible with WebMercatorQuad TMS for serving

Maintains Geospatial Integrity

  • Each overview level has proper x/y coordinate arrays
  • Native CRS preserved across all levels
  • Proper geotransform information maintained
  • CF-compliant metadata and standard names

Modern Implementation

  • Zarr V3 format with efficient compression
  • Consolidated metadata for fast access
  • Proper chunking for cloud-optimized access
  • Works with xarray's native plotting

Next

  • CF conventions translation

I hope this experimental implementation could provides a foundation for COG-style multiscale GeoZarr datasets for EOPF

@maxrjones @briannapagan @vincentsarago
Please let me know if I am going in the right direction. I'm still trying to get all the keys here

wietzesuijker and others added 4 commits June 6, 2025 17:07
- Implement functions to create Cloud Optimized GeoTIFF (COG) style overview levels.
- Calculate overview levels based on native dimensions and downsampling logic.
- Create overview templates maintaining native CRS and spatial attributes.
- Populate overview arrays with downsampled data using numpy methods.
- Verify coordinates and CRS in overview levels.
- Plot overview levels using xarray's native plotting capabilities.
…GeoZarr V3

This document outlines the core requirements, gaps in the current EOPF Zarr format, and the implementation process for converting EOPF datasets to comply with GeoZarr V3 standards. It includes detailed sections on CF compliance, spatial reference systems, multiscale support, and validation processes, along with a comparison between EOPF Zarr and GeoZarr V3.
"3": {}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed to not use TMS grid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is a remainder from my previous experiment

#### Overview Level Requirements

Each overview level MUST:
- Follow COG-style /2 downsampling (1:1, 1:2, 1:4, 1:8, etc.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the /2 downsampling is not mandatory in COGs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual requirement, right:

A COG file SHALL contain reduced resolution subfiles each one reducing the resolution by a minimum factor of 2 and a maximum factor of 10 from the previous one.
 
I think that is the type of requirement we should translate to GeoZarrr


### 3. Coordinate Arrays for All Levels

All GeoZarr V3 datasets MUST include proper coordinate arrays at every resolution level:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we require GeoTransform then we don't really need coordinate arrays. IMO having those array at each level would be painful to handle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: in COG we have one top level GeoTransform and then only the shape of the additional overviews

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, I really appreciate you spending time on this! I only took a quick look but think it's mostly on the right track.

"spatial_ref": {
"attrs": {
"crs_wkt": "PROJCS[\"WGS 84 / UTM zone 32N\"...]",
"spatial_ref": "PROJCS[\"WGS 84 / UTM zone 32N\"...]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"spatial_ref": "PROJCS[\"WGS 84 / UTM zone 32N\"...]",

I think only crs_wkt should be included because spatial_ref is not found as an attribute in CF compliant NetCDF files or as a tag in GeoTIFFs, it's a redundancy added by rioxarray for interoperability with GDAL in memory. IMO it's best not to persist those redundancies on disk.


| Issue | Description | Impact |
|-------|-------------|---------|
| **Missing CF Standard Names** | Variables lack `standard_name` attributes | Reduces interoperability with CF-compliant tools |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What proportion of the EOPF data be accurately described by the existing CF standard names? I was under the impression that data variable standardization was a bit more challenging with satellite products relative to climate / forecasting data.

| **Incomplete CRS Information** | CRS stored only as `proj:epsg` attribute | Limited compatibility with rioxarray and geospatial tools |
| **Missing Grid Mapping** | No `grid_mapping` attribute linking to spatial reference | Geospatial tools can't detect coordinate system |
| **No Multiscale Support** | Lacks overview levels and multiscale metadata | Poor performance for multi-scale visualization |
| **Missing Coordinate Arrays** | Overview levels lack proper x/y coordinate arrays | Cannot perform spatial operations on overview data |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For data that can be described by an affine transformation (e.g., regular grids), the coordinate arrays should not be required because they increase storage space, increase data transfer/load times, and can introduce errors associated with numerical precision relative to a functional coordinates. Geospatial processing libraries can produce explicit coordinate arrays if needed based on the affine transformation.

#### Overview Level Requirements

Each overview level MUST:
- Follow COG-style /2 downsampling (1:1, 1:2, 1:4, 1:8, etc.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual requirement, right:

A COG file SHALL contain reduced resolution subfiles each one reducing the resolution by a minimum factor of 2 and a maximum factor of 10 from the previous one.
 
I think that is the type of requirement we should translate to GeoZarrr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants