-
Notifications
You must be signed in to change notification settings - Fork 46
Description
I am working on a feature in virtualizarr
to read dmrpp metadata files and create a virtual xr.Dataset
containing manifest array's that can then be virtualized. This is the current workflow:
vdatasets = parser.parse(dmrs)
# vdatasets are xr.Datasets containing ManifestArray's
mds = xr.combine_nested(list(vdatasets), **xr_combine_kwargs)
mds.virtualize.to_kerchunk(filepath=outfile, format=outformat)
ds = xr.open_dataset(outfile, engine="virtualizarr", ...)
ds.time.values
However the chunk manifest, encoding, attrs, etc. is already in mds
so is it possible to read data directly from this dataset? My understanding is that once the "chunk manifest" ZEP is approved and the zarr-python
reader in xarray
is updated this should be possible. The xarray
reader for kerchunk
can accept a file or the reference json object directly from kerchunk
SingleHdf5ToZarr
and MultiZarrToZarr
. So similarly can we extract the refs from mds
and pass it to xr.open_dataset()
directly?
There probably still needs to be a function that extracts the refs so that xarray can make a new Dataset
object with all the indexes, cf_time handling, and open_dataset
checks.
mds = xr.combine_nested(list(vdatasets), **xr_combine_kwargs)
refs = mds.virtualize()
ds = xr.open_dataset(refs, engine="virtualizarr", ...)
Even reading directly from the ManifestArray dataset is possible but not sure how the new dataset object with numpy arrays and indexes would be separate from the original dataset
mds = xr.combine_nested(list(vdatasets), **xr_combine_kwargs)
mds.time.values
Metadata
Metadata
Assignees
Labels
Type
Projects
Status