Skip to content

Support Kerchunk indices embedded in STAC items #32

@TomAugspurger

Description

@TomAugspurger

stac-utils/xstac#38 is prototyping how we might store Kerchunk indices in STAC items. Storing Kerchunk metadata in STAC items removes the need to put that metadata in some sidecar file: https://tomaugspurger.net/posts/stac-updates/#stac-and-kerchunk.

The high-level goal is to store the metadata needed for Kerchunk under the fields added by the datacube extension. This lets us deduplicate a few fields (like the attrs maybe others). I'm not sure if this is worth doing or not, because now you need a function to translate between Kerchunk in STAC and the plain kerchunk references. But I don't think we should be putting JSON strings like .zarray in the STAC objects, so we'll needs something like that anyway I think.

Here's a hacky version of what I have in mind. Using this item collection: https://gist.github.com/TomAugspurger/5b5f40c34212b8302e824e66b477062a.

import pystac
import xstac
import pystac
import kerchunk.combine
import fsspec
import xarray as xr

class STACKerchunkBackend(xr.backends.BackendEntrypoint):
    open_dataset_parameters = ["filename_or_obj", "drop_variables"]

    def open_dataset(self, filename_or_obj, *, drop_variables=None):
        if isinstance(filename_or_obj, (list, pystac.ItemCollection)):
            refs = [xstac.kerchunk.stac_to_kerchunk(item) for item in filename_or_obj]
            refs2 = kerchunk.combine.MultiZarrToZarr(refs, concat_dims="time").translate()
        else:
            refs2 = xstac.kerchunk.stac_to_kerchunk(filename_or_obj)

        return xr.open_dataset(fsspec.filesystem("reference", fo=refs2).get_mapper(), engine="zarr", consolidated=False)

ic = pystac.ItemCollection.from_file("item_collection.json")

ds = xr.open_dataset(list(ic), engine=STACKerchunkBackend, chunks={})
ds

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions