[RFC] STAC Profiles #1353

jsignell · 2025-07-14T20:48:32Z

jsignell
Jul 14, 2025
Maintainer

Abstract

The idea is to add a layer of validation called a "Profile" that combines several extensions, possibly increasing the required fields for a given extension, and specifies how the collection is structured. This concept would be captured programmatically to allow for profile-level validation.

Motivation and Scope

Flexibility is the enemy of interoperability.

STAC is a flexible specification. That is by design. You can include any fields you like in metadata whether they are in an extension or not. You can also use any combination of extensions and when using an extension you can include or not include various fields. This flexibility makes it hard for users to make sense of the landscape and makes it hard for tools to operate on collections from different providers even if they represent the same type of data.

Say a user has an algorithm for SAR data. Sometime the SAR data comes from one producer and sometimes from another one - how do they know that it's compatible?

Usage and Impact

People who are trying to catalog a new dataset with STAC can choose a profile rather than selecting between dozens of extensions. This should make it easier to figure out what metadata fields to include.

Tools and people who are consuming STAC can see that an object conforms to a particular profile and better know what actions it supports.

Detailed Description

Profiles are essentially groupings of extensions that are practical to use together to satisfy the needs of a particular type of structured data. By structured data here we do not mean a dataset or product but a more general data type for instance: earth system model output or satellite imagery.

You only get one profile per collection and it's at the collection level. Everything in that collection follows that profile. The profile is advertised at the collection level so users know what to expect from the collection. In the tooling there can be is a validation step where the extensions have to match the profile.

The concept of a profile will be in STAC core. It's optional. Profiles are purely additive. You can still consume STAC that uses a profile without knowing anything about profiles.

A profile:

specifies extensions (including version) that must be used
adds additional requirements on extension fields
constrains how the collection is structured at the item/asset/band levels

A profile will not:

be more flexible than its component extensions
define custom fields that are not part of an extension
limit the use of extensions which are not part of the profile.
map to one specific data product (e.g. the is no "Landsat Profile")

Related Work

The information that we propose capturing in a "Profile" has so far been scattered across STAC extensions, best practice docs, and stactools package. By creating a structure to capture this level of information, the "Profile" concept will make these obsolete or much simpler.

Extensions that might really be profiles

There are some extensions that could be elevated to profiles. For example CEOS ARD STAC Extension defines different product family specifications and maps each to either an "Optical Profile" or a "Radar Profile".

The Machine Learning Model STAC Extension is another example of an extension that is really more than an extension. The fact that it has its own best practices which specifies which extensions to use with it kind of underscores that difference.

Best practices docs

There are several examples of best practices that constrain the flexibility inherent in STAC and try to lay out what extensions should be used and which fields must be specified:

stactools packages

In a way the stactools packages are an even more constrained version of a profile. They specify exactly how you go about creating a STAC item representing a Sentinel 2 scene for instance. Profiles are essentially one level of abstraction up from stactools packages.

Validation tools

Also of note is @gadomski's work on heystac which applies ratings to various STAC APIs based on their compliance with best practices.

Implementation

The idea is to use pydantic models similar to heystac. These models can be used for validation directly or they can used to generate json-shema which can be used across languages (not just Python).

Open questions:

Should profiles specify how the collection is structured? Like which roles are used, what asset keys should be within items, how bands should be captured?
Does there need to be a different profile for:
- items with single band COGs as assets vs items with one multi-band Zarr store as the asset?
- a mosaic product that combines many scenes vs the level 1 scene data?

vincentsarago · 2025-07-15T15:18:42Z

vincentsarago
Jul 15, 2025

Thanks for starting the discussion @jsignell

I think by nature STAC is flexible and this is why it gained a lot of traction. Profiles sounds to me that we will create a spec on top of a spec, loosing the flexibility and adding a layer to manage (it's already hard to manage extensions).

There are interesting thoughts in this ticket. I particularly like the fact that you could have a profile at the collection level. Right now there are no constrains on extensions each item can implements within a collection, so this would be def something nice.

I'm not saying it's bad but I'm worry about the complexity we will add and some 🕳️ we could face if we try to be too specific (e.g extension versions, item and/of asset level)

1 reply

jsignell Jul 15, 2025
Maintainer Author

Yeah these are good points. The explicit goal of this proposal is to make things less flexible. I totally agree that STAC's popularity is due to its flexibility but now that we've had several years and best practices have been proved out by large implementations it seems like a good time to see if we can have a flavor of STAC that is stricter.

I want to emphasize that this is purely additive. No one has to use profiles if they don't want to. It's just a way of making different products more consistent with each other and codifying best practices.

I'm not saying it's bad but I'm worry about the complexity we will add and some 🕳️ we could face if we try to be too specific (e.g extension versions, item and/of asset level)

That's a really good point. Maybe the first iteration should just be focused on which extensions to use together and which additional fields are required. If we don't specify which version of the extension then the field name could possibly be different, but maybe it's worth it to just say for instance:

Projection Extension is required.
- For Projection Extension >1 and < 2 proj:epsg is required
- For Projection Extension > 2 proj:code is required

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] STAC Profiles #1353

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[RFC] STAC Profiles #1353

Uh oh!

jsignell Jul 14, 2025 Maintainer

Abstract

Motivation and Scope

Usage and Impact

Detailed Description

Related Work

Extensions that might really be profiles

Best practices docs

stactools packages

Validation tools

Implementation

Open questions:

Replies: 1 comment · 1 reply

Uh oh!

vincentsarago Jul 15, 2025

Uh oh!

jsignell Jul 15, 2025 Maintainer Author

jsignell
Jul 14, 2025
Maintainer

Replies: 1 comment 1 reply

vincentsarago
Jul 15, 2025

jsignell Jul 15, 2025
Maintainer Author