-
Notifications
You must be signed in to change notification settings - Fork 4
Description
stac-utils/xstac#38 is prototyping how we might store Kerchunk indices in STAC items. Storing Kerchunk metadata in STAC items removes the need to put that metadata in some sidecar file: https://tomaugspurger.net/posts/stac-updates/#stac-and-kerchunk.
The high-level goal is to store the metadata needed for Kerchunk under the fields added by the datacube extension. This lets us deduplicate a few fields (like the attrs maybe others). I'm not sure if this is worth doing or not, because now you need a function to translate between Kerchunk in STAC and the plain kerchunk references. But I don't think we should be putting JSON strings like .zarray in the STAC objects, so we'll needs something like that anyway I think.
Here's a hacky version of what I have in mind. Using this item collection: https://gist.github.com/TomAugspurger/5b5f40c34212b8302e824e66b477062a.
import pystac
import xstac
import pystac
import kerchunk.combine
import fsspec
import xarray as xr
class STACKerchunkBackend(xr.backends.BackendEntrypoint):
open_dataset_parameters = ["filename_or_obj", "drop_variables"]
def open_dataset(self, filename_or_obj, *, drop_variables=None):
if isinstance(filename_or_obj, (list, pystac.ItemCollection)):
refs = [xstac.kerchunk.stac_to_kerchunk(item) for item in filename_or_obj]
refs2 = kerchunk.combine.MultiZarrToZarr(refs, concat_dims="time").translate()
else:
refs2 = xstac.kerchunk.stac_to_kerchunk(filename_or_obj)
return xr.open_dataset(fsspec.filesystem("reference", fo=refs2).get_mapper(), engine="zarr", consolidated=False)
ic = pystac.ItemCollection.from_file("item_collection.json")
ds = xr.open_dataset(list(ic), engine=STACKerchunkBackend, chunks={})
ds