Replies: 1 comment 1 reply
-
|
Thanks for starting the discussion @jsignell I think by nature STAC is flexible and this is why it gained a lot of traction. There are interesting thoughts in this ticket. I particularly like the fact that you could have a I'm not saying it's bad but I'm worry about the complexity we will add and some 🕳️ we could face if we try to be too specific (e.g extension versions, item and/of asset level) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Abstract
The idea is to add a layer of validation called a "Profile" that combines several extensions, possibly increasing the required fields for a given extension, and specifies how the collection is structured. This concept would be captured programmatically to allow for profile-level validation.
Motivation and Scope
Flexibility is the enemy of interoperability.
STAC is a flexible specification. That is by design. You can include any fields you like in metadata whether they are in an extension or not. You can also use any combination of extensions and when using an extension you can include or not include various fields. This flexibility makes it hard for users to make sense of the landscape and makes it hard for tools to operate on collections from different providers even if they represent the same type of data.
Say a user has an algorithm for SAR data. Sometime the SAR data comes from one producer and sometimes from another one - how do they know that it's compatible?
Usage and Impact
People who are trying to catalog a new dataset with STAC can choose a profile rather than selecting between dozens of extensions. This should make it easier to figure out what metadata fields to include.
Tools and people who are consuming STAC can see that an object conforms to a particular profile and better know what actions it supports.
Detailed Description
Profiles are essentially groupings of extensions that are practical to use together to satisfy the needs of a particular type of structured data. By structured data here we do not mean a dataset or product but a more general data type for instance: earth system model output or satellite imagery.
You only get one profile per collection and it's at the collection level. Everything in that collection follows that profile. The profile is advertised at the collection level so users know what to expect from the collection. In the tooling there can be is a validation step where the extensions have to match the profile.
The concept of a profile will be in STAC core. It's optional. Profiles are purely additive. You can still consume STAC that uses a profile without knowing anything about profiles.
A profile:
A profile will not:
Related Work
The information that we propose capturing in a "Profile" has so far been scattered across STAC extensions, best practice docs, and stactools package. By creating a structure to capture this level of information, the "Profile" concept will make these obsolete or much simpler.
Extensions that might really be profiles
There are some extensions that could be elevated to profiles. For example CEOS ARD STAC Extension defines different product family specifications and maps each to either an "Optical Profile" or a "Radar Profile".
The Machine Learning Model STAC Extension is another example of an extension that is really more than an extension. The fact that it has its own best practices which specifies which extensions to use with it kind of underscores that difference.
Best practices docs
There are several examples of best practices that constrain the flexibility inherent in STAC and try to lay out what extensions should be used and which fields must be specified:
stactools packages
In a way the stactools packages are an even more constrained version of a profile. They specify exactly how you go about creating a STAC item representing a Sentinel 2 scene for instance. Profiles are essentially one level of abstraction up from stactools packages.
Validation tools
Also of note is @gadomski's work on heystac which applies ratings to various STAC APIs based on their compliance with best practices.
Implementation
The idea is to use pydantic models similar to heystac. These models can be used for validation directly or they can used to generate json-shema which can be used across languages (not just Python).
Open questions:
Beta Was this translation helpful? Give feedback.
All reactions