Skip to content
Merged
Show file tree
Hide file tree
Changes from 82 commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
d737a9d
fix: add small fixes to create Items out of RDPS and HRDPS data sets …
henriaidasso Sep 23, 2025
6bde69e
feat: rdps extension added
henriaidasso Sep 23, 2025
911cbe9
feat: rdps collection config added
henriaidasso Sep 23, 2025
b5f0de9
feat: rdps implementation added
henriaidasso Sep 23, 2025
76b37a7
fix: export rdps implementation for cli access
henriaidasso Sep 23, 2025
5e5541a
fix: clean unused code comments
henriaidasso Sep 23, 2025
4d6aba0
fix: clean unused code
henriaidasso Sep 24, 2025
3866246
fix: filename for item id updated
henriaidasso Sep 24, 2025
138f0dd
fix: item validation added
henriaidasso Sep 24, 2025
cd5e801
fix: item validation added
henriaidasso Sep 24, 2025
eb86c50
fix: add properties to item
henriaidasso Sep 26, 2025
bfdd825
feat: hrdps ext. and impl. added
henriaidasso Sep 26, 2025
49804f5
fix: integrate properties
henriaidasso Sep 30, 2025
99868fc
fix: add providers handling
henriaidasso Oct 3, 2025
24066be
feat: cf extension added
henriaidasso Oct 3, 2025
90c30c1
fix: make providers optional
henriaidasso Oct 3, 2025
9ea89d7
feat: file info extension added
henriaidasso Oct 3, 2025
fc8806b
feat: dataclasses-json library added
henriaidasso Oct 7, 2025
ad2bba9
fix: contacts extension added
henriaidasso Oct 8, 2025
6c45ef2
fix: file info extension added
henriaidasso Oct 8, 2025
730763c
fix clean cf extension code
henriaidasso Oct 8, 2025
0ca5211
fix: make contacts optional
henriaidasso Oct 8, 2025
2e6a69e
Merge branch 'master' into rdps
henriaidasso Oct 8, 2025
a8c299d
fix: cordex test file fix
henriaidasso Oct 9, 2025
ec29535
fix: get assets check service_type type
henriaidasso Oct 9, 2025
933352f
Merge remote-tracking branch 'refs/remotes/origin/rdps' into rdps
henriaidasso Oct 9, 2025
27e2a40
fix: get assets check service_type type
henriaidasso Oct 9, 2025
92bc676
fix: contact data model removed
henriaidasso Oct 9, 2025
57a6836
fix: cf extension added using subclasses
henriaidasso Oct 10, 2025
d8c7b97
fix: hrdps impl. updated using subclass
henriaidasso Oct 10, 2025
2c207a7
fix: cf item extension get assets filter
henriaidasso Oct 10, 2025
c03b5cf
fix: remove dataclass-json dependency
henriaidasso Oct 10, 2025
126aa4c
doc: readme updated
henriaidasso Oct 10, 2025
40fda07
fix: file extension added using subclass
henriaidasso Oct 10, 2025
c968b28
fix: hrdps class doctstring updated
henriaidasso Oct 10, 2025
b75bb54
fix: contact rtype update
henriaidasso Oct 10, 2025
f22b369
fix: providers updated
henriaidasso Oct 15, 2025
7e6a764
fix: collection info updated
henriaidasso Oct 15, 2025
c462ecd
fix: file helper updated
henriaidasso Oct 15, 2025
e38ad09
fix: collection links updated
ahenrij Oct 16, 2025
dd7e480
fix: warning fixed
ahenrij Oct 21, 2025
4597fe9
fix: rdps tests updated
ahenrij Oct 21, 2025
29dc7e2
fix: populators updated
ahenrij Oct 21, 2025
ea75a9b
doc: comments added
ahenrij Oct 21, 2025
26d9135
fix: update link title
henriaidasso Oct 22, 2025
acaeea8
fix: add smaller size logo
ahenrij Oct 22, 2025
af3d1fe
fix: merged changes
ahenrij Oct 22, 2025
ffa1021
doc: readme updated
ahenrij Oct 23, 2025
5b60a09
fix: rdps collection info updated
ahenrij Oct 23, 2025
af32a48
fix: update model_fields access
ahenrij Oct 23, 2025
e24834a
fix: service type check in get_assets func
ahenrij Oct 23, 2025
e442359
fix: from data return type
ahenrij Oct 23, 2025
85f70f1
fix: reuse populator session in file helper
ahenrij Oct 23, 2025
faae3a1
fix: reuse populator session in file helper
ahenrij Oct 23, 2025
24a546d
fix: rdps extension updated
ahenrij Oct 23, 2025
0b10d2a
fix: collection level metadata updated
ahenrij Oct 23, 2025
0443acd
fix: update .gitignore
ahenrij Oct 23, 2025
dfe1437
fix: remove blank lines
ahenrij Oct 23, 2025
387baa7
fix: set file asset key in helper init
ahenrij Oct 23, 2025
11029e4
fix: udpate default variable values
ahenrij Oct 23, 2025
f904c4c
fix: optional file size if asset key absent
ahenrij Oct 24, 2025
01ac7da
fix: refactor helpers instantiation
ahenrij Oct 29, 2025
43a5d0b
fix: merged
ahenrij Oct 29, 2025
29b2890
doc: changes updated
ahenrij Oct 30, 2025
293a37e
doc: readme updated
ahenrij Oct 30, 2025
64e39de
fix: collection links updated
ahenrij Oct 31, 2025
660265d
fix: collection contacts updated
ahenrij Oct 31, 2025
afbe940
fix: rdps tests updated to check added fields
ahenrij Oct 31, 2025
64c33b6
fix: add crim as indexer in providers
ahenrij Oct 31, 2025
b73c216
fix: add crim as indexer in contacts only
ahenrij Oct 31, 2025
5b668e9
fix: license links updated
ahenrij Oct 31, 2025
84ab682
fix: dimensions and variables updated
ahenrij Nov 11, 2025
1a0b8aa
fix: dimensions and variables updated
ahenrij Nov 11, 2025
4366e6b
Merge branch 'master' into rdps
henriaidasso Nov 12, 2025
227a7d5
fix: precommit run
ahenrij Nov 12, 2025
30054ea
Merge branch 'master' into rdps
henriaidasso Nov 17, 2025
0dadae7
fix: updated get assets methods for uniformity
ahenrij Nov 17, 2025
49f0d4a
fix: cf iterate on values updated
ahenrij Nov 17, 2025
185ddac
fix: unit str to match validation schema
ahenrij Nov 17, 2025
8cdbc59
fix: return none for file content-length issues
ahenrij Nov 17, 2025
3f033c2
fix: remove unused future imports and adds in thredds
ahenrij Nov 17, 2025
1fea5bd
Merge branch 'master' into rdps
henriaidasso Nov 17, 2025
07696c4
fix: changes version fix
ahenrij Nov 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,20 @@

<!-- insert list items of new changes here -->

## [0.11.0](https://github.com/crim-ca/stac-populator/tree/0.11.0) (2025-11-17)
* Add `RDPS_CRIM` and `HRDPS_CRIM` implementations.
* Add `cf` extension adding CF Parameter metadata to (H)RDPS stac asset and items.
* Add `cf` and `file` helpers.
* Add `providers` and `contacts` extensions metdata to (H)RDPS stac collection.
* Fix deprecated access to `model_fields` in `BaseSTAC` data model class.
* Fix bug service type check in extensions' `get_assets` methods.
* Fix return type of `from_data` in `THREDDSCatalogDataModel`.
* Update RDPS and HRDPS tests.

## [1.11.0](https://github.com/crim-ca/stac-populator/tree/1.11.0) (2025-11-17)

* Add option to automatically update collection extents and summaries based on ingested items.

## [0.10.0](https://github.com/crim-ca/stac-populator/tree/0.10.0) (2025-11-11)
## [1.10.0](https://github.com/crim-ca/stac-populator/tree/1.10.0) (2025-11-11)

* Add `pre-commit` linting rules (code format + STAC field sorting in JSON).
* Add initial RDPS and HRDPS examples with minimal metadata from Ouranos THREDDS samples.
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,14 @@ Provided implementations of `STACpopulatorBase`:

| Implementation | Description |
|----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| [RDPS_CRIM][RDPS_CRIM] | Crawls a THREDDS Catalog for RDPS NCML-annotated NetCDF references to publish corresponding STAC Collection and Items. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another entry for HRDPS_CRIM. The description can simply highlight the difference compare to RDPS_CRIM.

| [HRDPS_CRIM][HRDPS_CRIM] | Crawls a THREDDS Catalog for HRDPS NCML-annotated NetCDF references to publish corresponding STAC Collection and Items. |
| [CMIP6_UofT][CMIP6_UofT] | Crawls a THREDDS Catalog for CMIP6 NCML-annotated NetCDF references to publish corresponding STAC Collection and Items. |
| [DirectoryLoader][DirLoader] | Crawls a subdirectory hierarchy of pre-generated STAC Collections and Items to publish to a STAC API endpoint. |
| [CORDEX-CMIP6_Ouranos][CORDEX-CMIP6_Ouranos] | Crawls a THREDDS Catalog for CORDEX-CMIP6 NetCDF references to publish corresponding STAC Collection and Items. |

[RDPS_CRIM]: STACpopulator/implementations/RDPS_CRIM/add_RDPS.py
[HRDPS_CRIM]: STACpopulator/implementations/HRDPS_CRIM/add_HRDPS.py
[CMIP6_UofT]: STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py
[DirLoader]: STACpopulator/implementations/DirectoryLoader/crawl_directory.py
[CORDEX-CMIP6_Ouranos]: STACpopulator/implementations/CORDEX-CMIP6_Ouranos/add_CORDEX-CMIP6.py
Expand Down
17 changes: 14 additions & 3 deletions STACpopulator/extensions/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,16 @@
class Helper:
"""Class to be subclassed by extension helpers."""

@classmethod
@abstractmethod
def from_data(
cls,
data: dict[str, Any],
**kwargs,
) -> "Helper":
"""Create a Helper instance from raw data."""
pass


class ExtensionHelper(BaseModel, Helper):
"""Base class for dataset properties going into the catalog.
Expand Down Expand Up @@ -190,7 +200,8 @@ def create_uid(self) -> str:
@model_validator(mode="after")
def find_helpers(self) -> "BaseSTAC":
"""Populate the list of extensions."""
for key, field in self.model_fields.items():
# Access model fields from class. From obj will be removed in pydantic v3
for key, field in type(self).model_fields.items():
if isinstance(field.annotation, type) and issubclass(field.annotation, Helper):
self._helpers.append(key)
return self
Expand Down Expand Up @@ -328,8 +339,8 @@ def get_assets(
return {
key: asset
for key, asset in self.item.get_assets().items()
if (service_type is ServiceType and service_type.value in asset.extra_fields)
or any(ServiceType.from_value(field, default=None) is ServiceType for field in asset.extra_fields)
if (isinstance(service_type, ServiceType) and service_type.value in asset.extra_fields)
or any(ServiceType.from_value(field, default=False) for field in asset.extra_fields)
}

def __repr__(self) -> str:
Expand Down
239 changes: 239 additions & 0 deletions STACpopulator/extensions/cf.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good and aligned with https://github.com/stac-utils/pystac/tree/main/pystac/extensions.

Maybe consider opening a PR directly over there and push the change upstream.
Then, this code would only have to define the "helper" portion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cross ref, I created this PR: stac-utils/pystac#1592

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pyproject.toml can be updated to point at the PR, using the latest commit reference. We can update it again once it gets integrated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean using the PR code version directly to import cf from pystac?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Using the latest commit hash of the branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did, but the presence of the pystac-client package enforces the installation of the pystac stable 1.14.1 version anyway (which makes sense).

I can think of (1) adding a constraints.txt file to force installation or (2) manually installing the PR's latest commit version of pystac but I would have to document it at least until the PR is merged. Both these options seem a little too much to me. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pystac-client pyproject file only requires pystac[validation]>=1.10.0. It should be possible to have pystac pinned to a specific commit using something like pystac @ git+https://github.com/stac-utils/pystac.git@<commit-hash>.

Try this first, and if that doesn't work, will see how to handle it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I did and it didn't work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I suggest we wait and see what comes out from stac-utils/pystac#1592 (comment). Maybe it could be done fairly soon and avoid the issue altogether.

If not, I guess it would be easier to keep the code as is in the meantime with a FIXME note until the PR is merged. Since the commit reference is not simple replacement in the dependencies, I think other workarounds would imply too many changes or manual steps leading to setup errors.

Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
"""CF Extension Module."""

from __future__ import annotations

import functools
from typing import (
Any,
Dict,
Generic,
Iterable,
List,
Literal,
Optional,
TypeVar,
Union,
cast,
get_args,
)

import pystac
from pydantic import BaseModel
from pystac.extensions import item_assets
from pystac.extensions.base import ExtensionManagementMixin, PropertiesExtension

from STACpopulator.extensions.base import ExtensionHelper
from STACpopulator.stac_utils import ServiceType

T = TypeVar("T", pystac.Collection, pystac.Item, pystac.Asset)
SchemaName = Literal["cf"]
SCHEMA_URI = "https://stac-extensions.github.io/cf/v0.2.0/schema.json"
PREFIX = f"{get_args(SchemaName)[0]}:"
PARAMETER_PROP = PREFIX + "parameter"


class CFParameter(BaseModel):
"""CFParameter."""

name: str
unit: str

def __repr__(self) -> str:
"""Return string repr."""
return f"<CFParameter name={self.name}, unit={self.unit}>"


class CFHelper(ExtensionHelper):
"""CFHelper."""

_prefix: str = "cf"
variables: Dict[str, Any]

@functools.cached_property
def parameters(self) -> List[CFParameter]:
"""Extracts cf:parameter-like information from item_data."""
parameters = []

for var in self.variables.values():
attrs = var.get("attributes", {})
name = attrs.get("standard_name") # Get the required standard name
if not name:
continue # Skip if no valid name
unit = attrs.get("units", "")
parameters.append(CFParameter(name=name, unit=unit))

return parameters

@classmethod
def from_data(
cls,
data: dict[str, Any],
**kwargs,
) -> "CFHelper":
"""Create a CFHelper instance from raw data."""
return cls(variables=data["data"]["variables"], **kwargs)

def apply(self, item: T, add_if_missing: bool = True) -> T:
"""Apply the Datacube extension to an item."""
ext = CFExtension.ext(item, add_if_missing=add_if_missing)
ext.apply(parameters=self.parameters)

# FIXME: This temporary workaround has been added to comply with the (most certainly buggy) validation schema for CF extension
# It should be remove once the PR is integrated since applying on the item should be enough
asset = item.assets["HTTPServer"]
cf_asset_ext = CFExtension.ext(asset, add_if_missing=True)
cf_asset_ext.apply(parameters=self.parameters)
return item


class CFExtension(
Generic[T],
PropertiesExtension,
ExtensionManagementMixin[Union[pystac.Asset, pystac.Item, pystac.Collection]],
):
"""CF Metadata Extension."""

@property
def name(self) -> SchemaName:
"""Return the schema name."""
return get_args(SchemaName)[0]

@property
def parameter(self) -> List[dict[str, Any]] | None:
"""Get or set the CF parameter(s)."""
return self._get_property(PARAMETER_PROP, int)

@parameter.setter
def parameter(self, v: List[dict[str, Any]] | None) -> None:
self._set_property(PARAMETER_PROP, v)

def apply(
self,
parameters: Union[List[CFParameter], List[dict[str, Any]]],
) -> None:
"""Apply CF Extension properties to the extended :class:`~pystac.Item` or :class:`~pystac.Asset`."""
if not isinstance(parameters[0], dict):
parameters = [p.model_dump() for p in parameters]
self.parameter = parameters

@classmethod
def get_schema_uri(cls) -> str:
"""Return this extension's schema URI."""
return SCHEMA_URI

@classmethod
def ext(cls, obj: T, add_if_missing: bool = False) -> CFExtension[T]:
"""Extend the given STAC Object with properties from the :stac-ext:`CF Extension <cf>`.

This extension can be applied to instances of :class:`~pystac.Item`, :class:`~pystac.Asset`, or :class:`~pystac.Collection`.

Raises
------
pystac.ExtensionTypeError : If an invalid object type is passed.
"""
if isinstance(obj, pystac.Collection):
cls.ensure_has_extension(obj, add_if_missing)
return cast(CFExtension[T], CollectionCFExtension(obj))
elif isinstance(obj, pystac.Item):
cls.ensure_has_extension(obj, add_if_missing)
return cast(CFExtension[T], ItemCFExtension(obj))
elif isinstance(obj, pystac.Asset):
cls.ensure_owner_has_extension(obj, add_if_missing)
return cast(CFExtension[T], AssetCFExtension(obj))
elif isinstance(obj, item_assets.AssetDefinition):
cls.ensure_owner_has_extension(obj, add_if_missing)
return cast(CFExtension[T], ItemAssetsCFExtension(obj))
else:
raise pystac.ExtensionTypeError(cls._ext_error_message(obj))


class ItemCFExtension(CFExtension[pystac.Item]):
"""
A concrete implementation of :class:`CFExtension` on an :class:`~pystac.Item`.

Extends the properties of the Item to include properties defined in the
:stac-ext:`CF Extension <cf>`.

This class should generally not be instantiated directly. Instead, call
:meth:`CFExtension.ext` on an :class:`~pystac.Item` to extend it.
"""

def __init__(self, item: pystac.Item) -> None:
self.item = item
self.properties = item.properties

def get_assets(
self,
service_type: Optional[ServiceType] = None,
) -> dict[str, pystac.Asset]:
"""Get the item's assets where eo:bands are defined.

Args:
service_type: If set, filter the assets such that only those with a
matching :class:`~STACpopulator.stac_utils.ServiceType` are returned.

Returns
-------
Dict[str, Asset]: A dictionary of assets that match ``service_type``
if set or else all of this item's assets were service types are defined.
"""
return {
key: asset
for key, asset in self.item.get_assets().items()
if (isinstance(service_type, ServiceType) and service_type.value in asset.extra_fields)
or any(ServiceType.from_value(field, default=False) for field in asset.extra_fields)
}

def __repr__(self) -> str:
"""Return repr."""
return f"<ItemCFExtension Item id={self.item.id}>"


class ItemAssetsCFExtension(CFExtension[item_assets.AssetDefinition]):
"""Extention for CF item assets."""

properties: dict[str, Any]
asset_defn: item_assets.AssetDefinition

def __init__(self, item_asset: item_assets.AssetDefinition) -> None:
self.asset_defn = item_asset
self.properties = item_asset.properties


class AssetCFExtension(CFExtension[pystac.Asset]):
"""
A concrete implementation of :class:`CFExtension` on an :class:`~pystac.Asset`.

Extends the Asset fields to include properties defined in the
:stac-ext:`CF Extension <cf>`.

This class should generally not be instantiated directly. Instead, call
:meth:`CFExtension.ext` on an :class:`~pystac.Asset` to extend it.
"""

asset_href: str
"""The ``href`` value of the :class:`~pystac.Asset` being extended."""

properties: dict[str, Any]
"""The :class:`~pystac.Asset` fields, including extension properties."""

additional_read_properties: Optional[Iterable[dict[str, Any]]] = None
"""If present, this will be a list containing 1 dictionary representing the
properties of the owning :class:`~pystac.Item`."""

def __init__(self, asset: pystac.Asset) -> None:
self.asset_href = asset.href
self.properties = asset.extra_fields
if asset.owner and isinstance(asset.owner, pystac.Item):
self.additional_read_properties = [asset.owner.properties]

def __repr__(self) -> str:
"""Return repr."""
return f"<AssetCFExtension Asset href={self.asset_href}>"


class CollectionCFExtension(CFExtension[pystac.Collection]):
"""Extension for CF data."""

def __init__(self, collection: pystac.Collection) -> None:
self.collection = collection
4 changes: 2 additions & 2 deletions STACpopulator/extensions/cmip6.py
Original file line number Diff line number Diff line change
Expand Up @@ -308,8 +308,8 @@ def get_assets(
return {
key: asset
for key, asset in self.item.get_assets().items()
if (service_type is ServiceType and service_type.value in asset.extra_fields)
or any(ServiceType.from_value(field, default=None) is ServiceType for field in asset.extra_fields)
if (isinstance(service_type, ServiceType) and service_type.value in asset.extra_fields)
or any(ServiceType.from_value(field, default=False) for field in asset.extra_fields)
}

def __repr__(self) -> str:
Expand Down
13 changes: 12 additions & 1 deletion STACpopulator/extensions/datacube.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,15 @@ def __init__(self, attrs: MutableMapping[str, Any]) -> None:
},
}

@classmethod
def from_data(
cls,
data: dict[str, Any],
**kwargs,
) -> "DataCubeHelper":
"""Create a DataCubeHelper instance from raw data."""
return cls(attrs=data["data"])

@property
@functools.cache
def dimensions(self) -> dict[str, Dimension]:
Expand Down Expand Up @@ -213,9 +222,11 @@ def variables(self) -> dict[str, Variable]:
else:
dtype = VariableType.DATA.value

dimensions = meta.get("shape", [])

variables[name] = Variable(
properties=dict(
dimensions=meta["shape"],
dimensions=[] if dimensions == [""] else dimensions,
Copy link
Collaborator Author

@henriaidasso henriaidasso Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmigneault this change fixes the empty string dimension issue in variables. Variables without dimensions now show up as below.

 "rotated_pole": {
    "type": "data",
    "unit": "",
    "dimensions": [], # instead of [""] previously 
    "description": ""
  },

For the extent attribute in dimensions, when I omit setting it for the Z axis (i.e., not putting it in the properties dict), extent still appears with the value [null, null] in the item's JSON. When I use the static method Dimension.from_dict(...) to cast the dimension object to a VerticalSpatialDimension, a missing extent throws RequiredPropertyMissing error which does not align with the DataCube specification where extent is optional for a VerticalSpatialDimension. I suggest we leave the current default value [null, null] and update later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding. 👍
Yes, let's move with [null, null] in the meantime.

I logged the issue: stac-utils/pystac#1593

type=dtype,
description=attrs.get("description", attrs.get("long_name", "")),
unit=attrs.get("units", ""),
Expand Down
Loading