-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Motivation
In imaging applications, especially interactive ones, the usability of a data array is greatly increased by having pre-computed sub-resolutions of the array. For example, an array of size (10**5, 10**5)
might have halving-steps pre-computed, providing arrays of sizes 5000, 2500, 1250, 625, 312 etc. Users can quickly load a low-resolution representation to choose which regions are worth loading in higher- or even full- resolution. A few examples of this trend in imaging file formats are provided under Related Reading.
The current zarr spec has the following issues when trying to naively specify such sub-resolutions:
- Arrays of differing size can only represent the individual resolution by naming convention
("Reslolution_0", "Resolution_1", etc.) This issue exists in a number of existing formats. - Storing data of differing dimensions in the same chunk is not intended.
- Even if data of differing dimensions (compression)
Generalization
In other domains, a generalization of this functionality might enable "summary data" to be stored,
where along a given dimension a function has been applied, e.g. averaging. This is usually most
beneficial when the function is sufficiently time-costly that its worth trading storage for speed.
Potential implementations
Filter / Memory-layout
Each chunk could be passed to a function which stores or reads the multiscale representation
with a given chunk. (TBD)
Array relationships
Metadata on a given array could specify one or both inheritance relationships to other arrays.
For example, if a child array link to its parent, it might store the following metadata:
{
"summary_of": {
"key": "Resolution_0",
"method": "halving",
"dimensions": [0, 1]
}
}
One issue with only having the parent relationship defined is how one determines the lowest
resolution. The child relationships could be represented with:
{
"summarized_by": [
{
"key": "Resolution_1",
"method": "having",
"dimensions": [0, 1]
}, ...
]
}
but this would require updating source arrays when creating a summary.
An alternative would be to provide a single source of metadata on the relationships between arrays.
Related reading
- N5MultiScaleSource
- BigDatViewerN5Demo
- Imaris File Format
- http://openmicroscopy.github.io/design/OME005/ (Design discussion in Storage of pyramid data in OME-TIFF ome/design#74)
Possible synonyms / Related concepts
- Global lossy compression
- Progressive compression
- Pyramidal images
- Sub-resolutions
- Summary views