Split traits to get image bytes and metadata bytes #79

kylebarron · 2025-03-24T22:37:26Z

This allows for separate caching/buffering of image data and metadata.

Closes #78 . I think this probably also closes #76

cc @feefladder what do you think?

feefladder · 2025-03-25T16:59:11Z

I think it's a good change, but the metadata bytes differ greatly from COG to COG, from a few kB to a few MB, so it's impossible for any middleware to be aware of that, unless we tell it how big tag data is.

I do think it's a good change, please don't misunderstand, but here's the point:

africa-scale 30m maps: 4-5MB
small-scale heightmap: 32 kB

What middleware could possibly account for both of these files? - I think none, unless #76 is landed. In #74, and the corresponding example, I did a:

PrefetchReader<(Metadata)CacheReader<ReqwestReader>>, which worked quite well (most COG ifds read in 2 requests). The only problem there was that CacheReader also doubled and cached the prefetch, which is a bit pointless. That would be solved with this pr, with my proposed changes.

kylebarron · 2025-03-25T17:19:44Z

unless we tell it how big tag data is

Who is we? The developer at compile time or the end user at runtime?

What middleware could possibly account for both of these files?

Sure you can; you can have an exponential metadata reader. Start with 32kb; if you end up needing more than 32kb while still reading metadata you read the next 256kb. If you need more than that you read the next 2MB, and so on. This is still all at the AsyncFileReader layer.

I think none, unless #76 is landed

#76 only works on a single tag at a time, so I don't see how it solves the problem.

kylebarron · 2025-03-25T17:21:02Z

PrefetchReader<(Metadata)CacheReader<ReqwestReader>>

With this PR, we can have a MetadataPrefetchReader and a ImageCacheReader, so that there's no double fetching or caching going on.

kylebarron · 2025-03-25T17:29:52Z

Sure you can; you can have an exponential metadata reader. Start with 32kb; if you end up needing more than 32kb while still reading metadata you read the next 256kb. If you need more than that you read the next 2MB, and so on. This is still all at the AsyncFileReader layer.

Was chatting with @vincentsarago and apparently this is roughly the same approach that GDAL takes to read header metadata

feefladder · 2025-03-25T18:25:10Z

unless we tell it how big tag data is

Who is we? The developer at compile time or the end user at runtime?

Dev at compile time

I think none, unless #76 is landed

#76 only works on a single tag at a time, so I don't see how it solves the problem.

Because of how a file is laid out in a COG and we're reading the full-res image first:

all calculations are in bytes, since that is what the middleware sees
- TileOffsets0 (Long8 in big, Long in small), largest  -\_ together tile_info_0
- TileByteCounts0 (Long) = 1/2 TileOffsets0 in BigTiff -/

- tile_info_1 ~= 1/4 tile_info_0
- tile_info_2 ~= 1/4 tile_info_1
- continue like that

sum to infinity = geometric series => tile_info_0/(1-1/4) = 4/3*tile_info_0
- bigtiff: tile_info_0 = 3/2*TileOffsets0 => total_bytes ~= 2*TileOffsets0
- smalltiff: tile_info_0 = 2*TileOffsests0 => total_bytes ~= 8/3*TileOffsets0

Since we're only really expecting to have large tile_info arrays in BigTiff, I went with the BigTiff assumption.

Sure you can; you can have an exponential metadata reader. Start with 32kb; if you end up needing more than 32kb while still reading metadata you read the next 256kb. If you need more than that you read the next 2MB, and so on. This is still all at the AsyncFileReader layer.

Was chatting with @vincentsarago and apparently this is roughly the same approach that GDAL takes to read header metadata

I think #74 does better than exponential prefetching in limiting the number of requests and not over-estimating if we're almost at the end of tile_info.

But in any case, my proposed changes can also not be made, and then the logic for the MetadataCacheReader does both the initial prefetch and the subsequent estimates.

feefladder · 2025-03-25T18:26:08Z

aka LGTM, but please also #76 :)

src/ifd.rs

kylebarron added 2 commits March 24, 2025 18:37

Split traits to get image bytes and metadata bytes

3e4b0ba

fix python compile

0393bce

feefladder mentioned this pull request Mar 25, 2025

Read tag data in its entirety from AsyncFileReader. #76

Closed

feefladder reviewed Mar 26, 2025

View reviewed changes

src/ifd.rs Outdated Show resolved Hide resolved

kylebarron and others added 2 commits March 26, 2025 10:50

Update docs

a60a359

Merge branch 'main' into kyle/split-traits-get-image-metadata

0c14a27

kylebarron enabled auto-merge (squash) March 26, 2025 14:52

kylebarron merged commit 4f387e1 into main Mar 26, 2025
6 checks passed

kylebarron deleted the kyle/split-traits-get-image-metadata branch March 26, 2025 14:54

kylebarron mentioned this pull request Mar 27, 2025

Read tag data in its entirety from AsyncFileReader. #81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split traits to get image bytes and metadata bytes #79

Split traits to get image bytes and metadata bytes #79

Uh oh!

kylebarron commented Mar 24, 2025 •

edited

Loading

Uh oh!

feefladder commented Mar 25, 2025 •

edited

Loading

Uh oh!

kylebarron commented Mar 25, 2025

Uh oh!

kylebarron commented Mar 25, 2025

Uh oh!

kylebarron commented Mar 25, 2025

Uh oh!

feefladder commented Mar 25, 2025 •

edited

Loading

Uh oh!

feefladder commented Mar 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Split traits to get image bytes and metadata bytes #79

Split traits to get image bytes and metadata bytes #79

Uh oh!

Conversation

kylebarron commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feefladder commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylebarron commented Mar 25, 2025

Uh oh!

kylebarron commented Mar 25, 2025

Uh oh!

kylebarron commented Mar 25, 2025

Uh oh!

feefladder commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feefladder commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylebarron commented Mar 24, 2025 •

edited

Loading

feefladder commented Mar 25, 2025 •

edited

Loading

feefladder commented Mar 25, 2025 •

edited

Loading

feefladder commented Mar 25, 2025 •

edited

Loading