Skip to content

WIP: Multiscale use-case #23

@joshmoore

Description

@joshmoore

Motivation

In imaging applications, especially interactive ones, the usability of a data array is greatly increased by having pre-computed sub-resolutions of the array. For example, an array of size (10**5, 10**5) might have halving-steps pre-computed, providing arrays of sizes 5000, 2500, 1250, 625, 312 etc. Users can quickly load a low-resolution representation to choose which regions are worth loading in higher- or even full- resolution. A few examples of this trend in imaging file formats are provided under Related Reading.

The current zarr spec has the following issues when trying to naively specify such sub-resolutions:

  • Arrays of differing size can only represent the individual resolution by naming convention
    ("Reslolution_0", "Resolution_1", etc.) This issue exists in a number of existing formats.
  • Storing data of differing dimensions in the same chunk is not intended.
  • Even if data of differing dimensions (compression)

Generalization

In other domains, a generalization of this functionality might enable "summary data" to be stored,
where along a given dimension a function has been applied, e.g. averaging. This is usually most
beneficial when the function is sufficiently time-costly that its worth trading storage for speed.

Potential implementations

Filter / Memory-layout

Each chunk could be passed to a function which stores or reads the multiscale representation
with a given chunk. (TBD)

Array relationships

Metadata on a given array could specify one or both inheritance relationships to other arrays.
For example, if a child array link to its parent, it might store the following metadata:

{
    "summary_of": {
        "key": "Resolution_0",
        "method": "halving",
        "dimensions": [0, 1]
    }
}

One issue with only having the parent relationship defined is how one determines the lowest
resolution. The child relationships could be represented with:

{
    "summarized_by": [
        {
            "key": "Resolution_1",
            "method": "having",
            "dimensions": [0, 1]
        }, ...

    ]
}

but this would require updating source arrays when creating a summary.

An alternative would be to provide a single source of metadata on the relationships between arrays.

Related reading

Possible synonyms / Related concepts

  • Global lossy compression
  • Progressive compression
  • Pyramidal images
  • Sub-resolutions
  • Summary views

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions