Zstd decompression fails due to unknown frame content size #625

mkitti · 2024-07-25T19:43:57Z

Neuroglancer appears to be having difficulty decompressing a Zstd compressed dataset:
https://neuroglancer-demo.appspot.com/#!https://s3proxy.janelia.org/shroff-public/shroff_20220523_Janelia_embryo2/20220523i_Janelia_embryo2_OTO_6x6x6nm_2xbin.zstd.zarr/s6/ng.json

The Google Chrome developer tools console reports the following error:

Error retrieving chunk [object Object]:0,0,0: Error: Invalid typed array length: 4294967226

Raw or Gzip compressed Zarr v2 datasets seem to load just fine:

Raw compressed Zarr v2 dataset:
https://neuroglancer-demo.appspot.com/#!https://s3proxy.janelia.org/shroff-public/shroff_20220523_Janelia_embryo2/20220523i_Janelia_embryo2_OTO_6x6x6nm_2xbin.raw.zarr/s6/ng.json

Gzip compressed Zarr v2 dataset:
https://neuroglancer-demo.appspot.com/#!https://s3proxy.janelia.org/shroff-public/shroff_20220523_Janelia_embryo2/20220523i_Janelia_embryo2_OTO_6x6x6nm_2xbin.gzip.zarr/s6/ng.json

The value 4294967226 corresponds to the 32-bit value 0xFFFFFFBA. The 64-bit value 0xFFFFFFFFFFFFFFBA corresponds to the Zstd error code for "Destination buffer is too small".

The dataset was encoded by Tensorstore 0.1.62 via conda-forge package with build number py312h7e2185d_0 and zstd version 1.5.6 ha6fb4c9_0

The text was updated successfully, but these errors were encountered:

mkitti · 2024-07-25T22:07:51Z

This Zstandard compression works:
https://neuroglancer-demo.appspot.com/#!https://s3proxy.janelia.org/shroff-public/shroff_20220523_Janelia_embryo2/20220523i_Janelia_embryo2_OTO_6x6x6nm_2xbin.zstd.zarr.mark/s6/ng.json

The problem is that ZSTD_getFrameContentSize returns 0xffffffffffffffff for the chunk encoded by Tensorstore. This is because the streaming API is used and ZSTD_CCtx_setPledgedSrcSize is not used to hint the content size. Therefore, the content size is not encoded in the header. The following chunk at the URL contain does not have a encoded content size in the header.
https://neuroglancer-demo.appspot.com/#!https://s3proxy.janelia.org/shroff-public/shroff_20220523_Janelia_embryo2/20220523i_Janelia_embryo2_OTO_6x6x6nm_2xbin.zstd.zarr.mark/s6/0/0/0.bad

A chunk with a correctly encode content size in the header is located at the following URL:
https://neuroglancer-demo.appspot.com/#!https://s3proxy.janelia.org/shroff-public/shroff_20220523_Janelia_embryo2/20220523i_Janelia_embryo2_OTO_6x6x6nm_2xbin.zstd.zarr.mark/s6/0/0/0.good

numcodecs.js does not handle the return value of Zstd_getFrameContentSize properly. It passes the unchecked value directly to malloc.
https://github.com/manzt/numcodecs.js/blob/b9a8ca932240a6de9ad6c00f9820ab5e6bbca4c1/codecs/zstd/zstd_codec.cpp#L28-L29

Suggested actions for remediation:

Neuroglancer should communicate the expected chunk size to decoding codecs. Chunk data should not be able to dictate the destination buffer size without constraint.
numcodecs.js should correctly handle ZSTD_CONTENTSIZE_UNKNOWN and ZSTD_CONTENTSIZE_ERROR rather than just passing the result to malloc.
Tensorstore should encode the content size in the Zstd header by using ZSTD_CCtx_setPledgedSrcSize

jbms · 2024-07-26T05:12:43Z

Thanks for the investigation!

Suggested actions for remediation:

Neuroglancer should communicate the expected chunk size to decoding codecs. Chunk data should not be able to dictate the destination buffer size without constraint.

Agreed in the typical case when the size is known, but with zarr v3 chained codecs, or future support for variable-length strings, that isn't always possible.

numcodecs.js should correctly handle ZSTD_CONTENTSIZE_UNKNOWN and ZSTD_CONTENTSIZE_ERROR rather than just passing the result to malloc.

Agreed.

Tensorstore should encode the content size in the Zstd header by using ZSTD_CCtx_setPledgedSrcSize

Agreed when possible.

mkitti · 2024-08-14T19:06:43Z

manzt/numcodecs.js#47 was merged earlier today and released as v0.3.2. This includes Zstd stream decompression capability that does not require a known frame content size in the Zstd header.

mkitti · 2024-09-05T19:18:30Z

@jbms I helped to address point 2 above by implementing streaming decompression in numcodecs.js 0.3.2. Bumping the version as in #639 would help alleviate the immediate problem of neuroglancer not being able to read some datasets created by tensorstore.

Let me know if you think more needs to be done there. Next, I was going to try to dig into point 3 to figure out why the pledged size is not making it into the frame header when tensorstore uses Zstandard.

jbms · 2024-10-21T23:31:46Z

This issue actually exists in tensorstore only with zarr v2, and n5, but not with zarr v3. The internal API in tensorstore used for zarr v2 codecs does not easily allow the pledged size to be set, but I am planning to refactor to eliminate the separate zarr v2 codec implementations and use the zarr v3 codec implementations instead (which provide a way to propagate the known size information); that will fix the issue for zarr v2 and n5.

mkitti mentioned this issue Jul 25, 2024

The returned value from ZSTD_getFrameContentSize is not checked. manzt/numcodecs.js#46

Closed

mkitti changed the title ~~Zarr v2 zstd decompression~~ Zstd decompression fails due to unknown frame content size Jul 26, 2024

mkitti mentioned this issue Jul 26, 2024

Zstd compression does not encode content size in header google/tensorstore#182

Open

mkitti mentioned this issue Aug 15, 2024

fix(zstd): Upgrade numcodecs.js to 0.3.2 for Zstd streaming decompression #639

Merged

jbms closed this as completed in #639 Oct 21, 2024

mkitti mentioned this issue Nov 18, 2024

Update Zstd decompression for "unknown decompressed size" when streaming API was used for compression HDFGroup/hdf5_plugins#116

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Zstd decompression fails due to unknown frame content size #625

Zstd decompression fails due to unknown frame content size #625

mkitti commented Jul 25, 2024 •

edited

Loading

mkitti commented Jul 25, 2024

Uh oh!

jbms commented Jul 26, 2024 •

edited

Loading

Uh oh!

mkitti commented Aug 14, 2024

Uh oh!

mkitti commented Sep 5, 2024

Uh oh!

jbms commented Oct 21, 2024

Uh oh!

Zstd decompression fails due to unknown frame content size #625

Zstd decompression fails due to unknown frame content size #625

Comments

mkitti commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mkitti commented Jul 25, 2024

Uh oh!

jbms commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkitti commented Aug 14, 2024

Uh oh!

mkitti commented Sep 5, 2024

Uh oh!

jbms commented Oct 21, 2024

Uh oh!

mkitti commented Jul 25, 2024 •

edited

Loading

jbms commented Jul 26, 2024 •

edited

Loading