Skip to content
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
d737a9d
fix: add small fixes to create Items out of RDPS and HRDPS data sets …
henriaidasso Sep 23, 2025
6bde69e
feat: rdps extension added
henriaidasso Sep 23, 2025
911cbe9
feat: rdps collection config added
henriaidasso Sep 23, 2025
b5f0de9
feat: rdps implementation added
henriaidasso Sep 23, 2025
76b37a7
fix: export rdps implementation for cli access
henriaidasso Sep 23, 2025
5e5541a
fix: clean unused code comments
henriaidasso Sep 23, 2025
4d6aba0
fix: clean unused code
henriaidasso Sep 24, 2025
3866246
fix: filename for item id updated
henriaidasso Sep 24, 2025
138f0dd
fix: item validation added
henriaidasso Sep 24, 2025
cd5e801
fix: item validation added
henriaidasso Sep 24, 2025
eb86c50
fix: add properties to item
henriaidasso Sep 26, 2025
bfdd825
feat: hrdps ext. and impl. added
henriaidasso Sep 26, 2025
49804f5
fix: integrate properties
henriaidasso Sep 30, 2025
99868fc
fix: add providers handling
henriaidasso Oct 3, 2025
24066be
feat: cf extension added
henriaidasso Oct 3, 2025
90c30c1
fix: make providers optional
henriaidasso Oct 3, 2025
9ea89d7
feat: file info extension added
henriaidasso Oct 3, 2025
fc8806b
feat: dataclasses-json library added
henriaidasso Oct 7, 2025
ad2bba9
fix: contacts extension added
henriaidasso Oct 8, 2025
6c45ef2
fix: file info extension added
henriaidasso Oct 8, 2025
730763c
fix clean cf extension code
henriaidasso Oct 8, 2025
0ca5211
fix: make contacts optional
henriaidasso Oct 8, 2025
2e6a69e
Merge branch 'master' into rdps
henriaidasso Oct 8, 2025
a8c299d
fix: cordex test file fix
henriaidasso Oct 9, 2025
ec29535
fix: get assets check service_type type
henriaidasso Oct 9, 2025
933352f
Merge remote-tracking branch 'refs/remotes/origin/rdps' into rdps
henriaidasso Oct 9, 2025
27e2a40
fix: get assets check service_type type
henriaidasso Oct 9, 2025
92bc676
fix: contact data model removed
henriaidasso Oct 9, 2025
57a6836
fix: cf extension added using subclasses
henriaidasso Oct 10, 2025
d8c7b97
fix: hrdps impl. updated using subclass
henriaidasso Oct 10, 2025
2c207a7
fix: cf item extension get assets filter
henriaidasso Oct 10, 2025
c03b5cf
fix: remove dataclass-json dependency
henriaidasso Oct 10, 2025
126aa4c
doc: readme updated
henriaidasso Oct 10, 2025
40fda07
fix: file extension added using subclass
henriaidasso Oct 10, 2025
c968b28
fix: hrdps class doctstring updated
henriaidasso Oct 10, 2025
b75bb54
fix: contact rtype update
henriaidasso Oct 10, 2025
f22b369
fix: providers updated
henriaidasso Oct 15, 2025
7e6a764
fix: collection info updated
henriaidasso Oct 15, 2025
c462ecd
fix: file helper updated
henriaidasso Oct 15, 2025
e38ad09
fix: collection links updated
ahenrij Oct 16, 2025
dd7e480
fix: warning fixed
ahenrij Oct 21, 2025
4597fe9
fix: rdps tests updated
ahenrij Oct 21, 2025
29dc7e2
fix: populators updated
ahenrij Oct 21, 2025
ea75a9b
doc: comments added
ahenrij Oct 21, 2025
26d9135
fix: update link title
henriaidasso Oct 22, 2025
acaeea8
fix: add smaller size logo
ahenrij Oct 22, 2025
af3d1fe
fix: merged changes
ahenrij Oct 22, 2025
ffa1021
doc: readme updated
ahenrij Oct 23, 2025
5b60a09
fix: rdps collection info updated
ahenrij Oct 23, 2025
af32a48
fix: update model_fields access
ahenrij Oct 23, 2025
e24834a
fix: service type check in get_assets func
ahenrij Oct 23, 2025
e442359
fix: from data return type
ahenrij Oct 23, 2025
85f70f1
fix: reuse populator session in file helper
ahenrij Oct 23, 2025
faae3a1
fix: reuse populator session in file helper
ahenrij Oct 23, 2025
24a546d
fix: rdps extension updated
ahenrij Oct 23, 2025
0b10d2a
fix: collection level metadata updated
ahenrij Oct 23, 2025
0443acd
fix: update .gitignore
ahenrij Oct 23, 2025
dfe1437
fix: remove blank lines
ahenrij Oct 23, 2025
387baa7
fix: set file asset key in helper init
ahenrij Oct 23, 2025
11029e4
fix: udpate default variable values
ahenrij Oct 23, 2025
f904c4c
fix: optional file size if asset key absent
ahenrij Oct 24, 2025
01ac7da
fix: refactor helpers instantiation
ahenrij Oct 29, 2025
43a5d0b
fix: merged
ahenrij Oct 29, 2025
29b2890
doc: changes updated
ahenrij Oct 30, 2025
293a37e
doc: readme updated
ahenrij Oct 30, 2025
64e39de
fix: collection links updated
ahenrij Oct 31, 2025
660265d
fix: collection contacts updated
ahenrij Oct 31, 2025
afbe940
fix: rdps tests updated to check added fields
ahenrij Oct 31, 2025
64c33b6
fix: add crim as indexer in providers
ahenrij Oct 31, 2025
b73c216
fix: add crim as indexer in contacts only
ahenrij Oct 31, 2025
5b668e9
fix: license links updated
ahenrij Oct 31, 2025
84ab682
fix: dimensions and variables updated
ahenrij Nov 11, 2025
1a0b8aa
fix: dimensions and variables updated
ahenrij Nov 11, 2025
4366e6b
Merge branch 'master' into rdps
henriaidasso Nov 12, 2025
227a7d5
fix: precommit run
ahenrij Nov 12, 2025
30054ea
Merge branch 'master' into rdps
henriaidasso Nov 17, 2025
0dadae7
fix: updated get assets methods for uniformity
ahenrij Nov 17, 2025
49f0d4a
fix: cf iterate on values updated
ahenrij Nov 17, 2025
185ddac
fix: unit str to match validation schema
ahenrij Nov 17, 2025
8cdbc59
fix: return none for file content-length issues
ahenrij Nov 17, 2025
3f033c2
fix: remove unused future imports and adds in thredds
ahenrij Nov 17, 2025
1fea5bd
Merge branch 'master' into rdps
henriaidasso Nov 17, 2025
07696c4
fix: changes version fix
ahenrij Nov 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,6 @@ build
# Old Submodule Path
# Could be used locally
pyessv-archive/

# Temp files downloaded
temp_files/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new location that we're storing temporary files on disk? Why not use the users temp director or their cache directory if you want these files to persist for longer?

236 changes: 236 additions & 0 deletions STACpopulator/extensions/cf.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good and aligned with https://github.com/stac-utils/pystac/tree/main/pystac/extensions.

Maybe consider opening a PR directly over there and push the change upstream.
Then, this code would only have to define the "helper" portion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cross ref, I created this PR: stac-utils/pystac#1592

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pyproject.toml can be updated to point at the PR, using the latest commit reference. We can update it again once it gets integrated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean using the PR code version directly to import cf from pystac?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Using the latest commit hash of the branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did, but the presence of the pystac-client package enforces the installation of the pystac stable 1.14.1 version anyway (which makes sense).

I can think of (1) adding a constraints.txt file to force installation or (2) manually installing the PR's latest commit version of pystac but I would have to document it at least until the PR is merged. Both these options seem a little too much to me. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pystac-client pyproject file only requires pystac[validation]>=1.10.0. It should be possible to have pystac pinned to a specific commit using something like pystac @ git+https://github.com/stac-utils/pystac.git@<commit-hash>.

Try this first, and if that doesn't work, will see how to handle it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I did and it didn't work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I suggest we wait and see what comes out from stac-utils/pystac#1592 (comment). Maybe it could be done fairly soon and avoid the issue altogether.

If not, I guess it would be easier to keep the code as is in the meantime with a FIXME note until the PR is merged. Since the commit reference is not simple replacement in the dependencies, I think other workarounds would imply too many changes or manual steps leading to setup errors.

Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
"""CF Extension Module."""

from __future__ import annotations

import functools
from dataclasses import dataclass
from typing import (
Any,
Generic,
Iterable,
List,
Literal,
Optional,
TypeVar,
Union,
cast,
get_args,
)

import pystac
from dataclasses_json import dataclass_json
from pystac.extensions import item_assets
from pystac.extensions.base import ExtensionManagementMixin, PropertiesExtension, S

from STACpopulator.stac_utils import ServiceType

T = TypeVar("T", pystac.Collection, pystac.Item, pystac.Asset, item_assets.AssetDefinition)
SchemaName = Literal["cf"]
SCHEMA_URI = "https://stac-extensions.github.io/cf/v0.2.0/schema.json"
PREFIX = f"{get_args(SchemaName)[0]}:"
PARAMETER_PROP = PREFIX + "parameter"


def add_ext_prefix(name: str) -> str:
"""Return the given name prefixed with this extension's prefix."""
return PREFIX + name if "datetime" not in name else name


@dataclass_json
@dataclass
class CFParameter:
"""CFParameter."""

name: str
unit: Optional[str]

def __repr__(self) -> str:
"""Return string repr."""
return f"<CFParameter name={self.name}, unit={self.unit}>"


class CFHelper:
"""CFHelper."""

def __init__(self, variables: dict[str, any]) -> None:
"""Take a STAC item variables to identify CF parameters metadata."""
self.variables = variables

@functools.cached_property
def parameters(self) -> List[CFParameter]:
"""Extracts cf:parameter-like information from item_data."""
parameters = []

for _, var in self.variables.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use .values() if you're not going to use the key. It makes the intention of the code clearer.

attrs = var.get("attributes", {})
name = attrs.get("standard_name") # Get the required standard name
if not name:
# Skip if no valid name
continue

unit = attrs.get("units") or ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CFParameter.unit is annotated as Optional[str] but here we're not accepting a None value. Do we want to make a distinction between None and the empty string? If not, then let's choose to represent nulls as either None or the empty string and make sure the values and annotations are consistent.

Copy link
Collaborator Author

@henriaidasso henriaidasso Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a mismatch between the CFExtension specification where unit appears to be not required and its validation schema requiring unit to be a string. I will update the annotation to str and raise an issue on CFExtension.

parameters.append(CFParameter(name=name, unit=unit))

return parameters


class CFExtension(
Generic[T],
PropertiesExtension,
ExtensionManagementMixin[Union[pystac.Asset, pystac.Item, pystac.Collection]],
):
"""CF Metadata Extension."""

@property
def name(self) -> SchemaName:
"""Return the schema name."""
return get_args(SchemaName)[0]

@property
def parameter(self) -> List[dict[str, Any]] | None:
"""Get or set the CF parameter(s)."""
return self._get_property(PARAMETER_PROP, int)

@parameter.setter
def parameter(self, v: List[dict[str, Any]] | None) -> None:
self._set_property(PARAMETER_PROP, v)

def apply(
self,
parameters: Union[List[CFParameter], List[dict[str, Any]]],
) -> None:
"""Apply CF Extension properties to the extended :class:`~pystac.Item` or :class:`~pystac.Asset`."""
if not isinstance(parameters[0], dict):
parameters = [p.to_dict() for p in parameters]
self.parameter = parameters

@classmethod
def get_schema_uri(cls) -> str:
"""Return this extension's schema URI."""
return SCHEMA_URI

@classmethod
def has_extension(cls, obj: S) -> bool:
"""Return True iff the object has an extension for that matches this class' schema URI."""
# FIXME: this override should be removed once an official and versioned schema is released
# ignore the original implementation logic for a version regex
# since in our case, the VERSION_REGEX is not fulfilled (ie: using 'main' branch, no tag available...)
ext_uri = cls.get_schema_uri()
return obj.stac_extensions is not None and any(uri == ext_uri for uri in obj.stac_extensions)

@classmethod
def ext(cls, obj: T, add_if_missing: bool = False) -> CFExtension[T]:
"""Extend the given STAC Object with properties from the :stac-ext:`CF Extension <cf>`.
This extension can be applied to instances of :class:`~pystac.Item`, :class:`~pystac.Asset`, or :class:`~pystac.Collection`.
Raises
------
pystac.ExtensionTypeError : If an invalid object type is passed.
"""
if isinstance(obj, pystac.Collection):
cls.ensure_has_extension(obj, add_if_missing)
return cast(CFExtension[T], CollectionCFExtension(obj))
elif isinstance(obj, pystac.Item):
cls.ensure_has_extension(obj, add_if_missing)
return cast(CFExtension[T], ItemCFExtension(obj))
elif isinstance(obj, pystac.Asset):
cls.ensure_owner_has_extension(obj, add_if_missing)
return cast(CFExtension[T], AssetCFExtension(obj))
elif isinstance(obj, item_assets.AssetDefinition):
cls.ensure_owner_has_extension(obj, add_if_missing)
return cast(CFExtension[T], ItemAssetsCFExtension(obj))
else:
raise pystac.ExtensionTypeError(cls._ext_error_message(obj))


class ItemCFExtension(CFExtension[pystac.Item]):
"""
A concrete implementation of :class:`CFExtension` on an :class:`~pystac.Item`.
Extends the properties of the Item to include properties defined in the
:stac-ext:`CF Extension <cf>`.
This class should generally not be instantiated directly. Instead, call
:meth:`CFExtension.ext` on an :class:`~pystac.Item` to extend it.
"""

def __init__(self, item: pystac.Item) -> None:
self.item = item
self.properties = item.properties

def get_assets(
self,
service_type: Optional[ServiceType] = None,
) -> dict[str, pystac.Asset]:
"""Get the item's assets where eo:bands are defined.
Args:
service_type: If set, filter the assets such that only those with a
matching :class:`~STACpopulator.stac_utils.ServiceType` are returned.
Returns
-------
Dict[str, Asset]: A dictionary of assets that match ``service_type``
if set or else all of this item's assets were service types are defined.
"""
return {
key: asset
for key, asset in self.item.get_assets().items()
if (service_type is ServiceType and service_type.value in asset.extra_fields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this always be False since service_type is an enum value and ServiceType is the class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Copied it from the cmip6.py extension module and didn't pay much attention. But yeah, isinstance() should be used here.

or any(ServiceType.from_value(field, default=None) is ServiceType for field in asset.extra_fields)
}

def __repr__(self) -> str:
"""Return repr."""
return f"<ItemCFExtension Item id={self.item.id}>"


class ItemAssetsCFExtension(CFExtension[item_assets.AssetDefinition]):
"""Extention for CF item assets."""

properties: dict[str, Any]
asset_defn: item_assets.AssetDefinition

def __init__(self, item_asset: item_assets.AssetDefinition) -> None:
self.asset_defn = item_asset
self.properties = item_asset.properties


class AssetCFExtension(CFExtension[pystac.Asset]):
"""
A concrete implementation of :class:`CFExtension` on an :class:`~pystac.Asset`.
Extends the Asset fields to include properties defined in the
:stac-ext:`CF Extension <cf>`.
This class should generally not be instantiated directly. Instead, call
:meth:`CFExtension.ext` on an :class:`~pystac.Asset` to extend it.
"""

asset_href: str
"""The ``href`` value of the :class:`~pystac.Asset` being extended."""

properties: dict[str, Any]
"""The :class:`~pystac.Asset` fields, including extension properties."""

additional_read_properties: Optional[Iterable[dict[str, Any]]] = None
"""If present, this will be a list containing 1 dictionary representing the
properties of the owning :class:`~pystac.Item`."""

def __init__(self, asset: pystac.Asset) -> None:
self.asset_href = asset.href
self.properties = asset.extra_fields
if asset.owner and isinstance(asset.owner, pystac.Item):
self.additional_read_properties = [asset.owner.properties]

def __repr__(self) -> str:
"""Return repr."""
return f"<AssetCFExtension Asset href={self.asset_href}>"


class CollectionCFExtension(CFExtension[pystac.Collection]):
"""Extension for CF data."""

def __init__(self, collection: pystac.Collection) -> None:
self.collection = collection
84 changes: 84 additions & 0 deletions STACpopulator/extensions/contact.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
"""Contact data model."""

from dataclasses import dataclass, field
from typing import List, Optional

import pystac
from dataclasses_json import LetterCase, config, dataclass_json

SCHEMA_URI = "https://stac-extensions.github.io/contacts/v0.1.1/schema.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we already have the json schema for this extension is there a reason why we're redefining all the various fields in the dataclasses as well?

I thought the dataclasses were meant to extend the json schemas, not to replicate the same information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Didn't know that :) Makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema reference must be included in the Collection/Item stac_extensions, otherwise it won't be effective (nor validated during the POST API request).

Normally, the https://github.com/stac-utils/pystac/tree/main/pystac/extensions definition auto-apply this reference when .ext() is invoked on a given pystac.STACObject.

pystac's approach is to extend the object with additional properties and apply any such references within the extension (all self-contained). This avoids having (or forgetting) to manually add item["stac_extensions"].append(SCHEMA_URI). Also, it avoids importing these references for every possible schema to apply. The drawback is that their "extension" classes are not using latest python methods (dataclasses/pydantic models), hence why we have the "helpers" here to do the mapping between the 2 strategies.

As we can see in this PR, the classes below are explicitly imported individually in STACpopulator/populator_base.py and requires various updates to it. This does not extend well with future extensions. They should all be inserted via the "helper" mechanism, and only this "helper" should deal with the mapping of any relevant properties coming from the config, their location in the collection/items, and the insertion of the schema URI.



@dataclass_json
@dataclass
class Info:
"""Gives contact information for and their "roles"."""

value: str
"""The actual contact information, e.g. the phone number or the email address."""

roles: Optional[List[str]] = None
"""The type(s) of this contact information, e.g. whether it's at work or at home."""


@dataclass_json
@dataclass
class Address:
"""Physical location at which contact can be made."""

deliveryPoint: Optional[List[str]] = field(metadata=config(letter_case=LetterCase.CAMEL), default=None)
"""Address lines for the location, for example a street name and a house number."""

city: Optional[str] = None
"""City for the location."""

administrativeArea: Optional[str] = field(metadata=config(letter_case=LetterCase.CAMEL), default=None)
"""State or province of the location."""

postalCode: Optional[str] = field(metadata=config(letter_case=LetterCase.CAMEL), default=None)
""" ZIP or other postal code."""

country: Optional[str] = None
"""Country of the physical address."""


@dataclass_json
@dataclass
class Contact:
"""Provides information about a contact."""

name: Optional[str] = None
"""The name of the responsible person. Required if organization is missing."""

organization: Optional[str] = None
"""Organization or affiliation of the contact. Required if name is missing"""

identifier: Optional[str] = None
"""A value uniquely identifying the contact."""

position: Optional[str] = None
"""The name of the role or position of the responsible person."""

description: Optional[str] = None
"""Detailed multi-line description to fully explain the STAC entity."""

logo: Optional[pystac.Link] = None
"""Graphic identifying the contact."""

phones: Optional[List[Info]] = None
"""Telephone numbers at which contact can be made."""

emails: Optional[List[Info]] = None
"""Email address at which contact can be made."""

addresses: Optional[List[Address]] = None
"""Physical location at which contact can be made."""

links: Optional[List[pystac.Link]] = None
"""Links related to the contact."""

contactInstructions: Optional[str] = field(metadata=config(letter_case=LetterCase.CAMEL), default=None)
"""Supplemental instructions on how or when to contact the responsible party."""

roles: Optional[List[str]] = None
"""The set of named duties, job functions and/or permissions associated with this contact."""
Loading