Skip to content

Using hidefix to determine byte ranges in HDF files? #38

@TomNicholas

Description

@TomNicholas

I'm building VirtualiZarr, an evolution of kerchunk, that allows you to determine byte ranges of chunks in netCDF files, but then concatenate the virtual representation of those chunks using xarray's API.

This works by creating a ChunkManifest object in-memory (one per netCDF Variable per file initially), then defining ways to merge those manifests.

What I'm wondering is if hidefix's Index class could be useful to me as a way to generate the ChunkManifest for a netCDF file without using kerchunk/fsspec (see this issue). In other words I use hidefix only to determine the byte ranges, not for actually reading the data. (I plan to actually read the bytes later using the rust object-store crate, see zarr-developers/zarr-python#1661).

Q's:

  • Is this idea dumb?
  • Does hidefix.Index contain the byte range information I'm assuming it does?
  • Can hidefix read over S3?
  • Would I be better off just using h5py directly?

cc @norlandrhagen

xref pydata/xarray#7446

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions