Skip to content

Split traits to get image bytes and metadata bytes #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 26, 2025

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Mar 24, 2025

This allows for separate caching/buffering of image data and metadata.

Closes #78 . I think this probably also closes #76

cc @feefladder what do you think?

@feefladder
Copy link
Contributor

feefladder commented Mar 25, 2025

I think it's a good change, but the metadata bytes differ greatly from COG to COG, from a few kB to a few MB, so it's impossible for any middleware to be aware of that, unless we tell it how big tag data is.

I do think it's a good change, please don't misunderstand, but here's the point:

What middleware could possibly account for both of these files? - I think none, unless #76 is landed. In #74, and the corresponding example, I did a:

PrefetchReader<(Metadata)CacheReader<ReqwestReader>>, which worked quite well (most COG ifds read in 2 requests). The only problem there was that CacheReader also doubled and cached the prefetch, which is a bit pointless. That would be solved with this pr, with my proposed changes.

@kylebarron
Copy link
Member Author

unless we tell it how big tag data is

Who is we? The developer at compile time or the end user at runtime?

What middleware could possibly account for both of these files?

Sure you can; you can have an exponential metadata reader. Start with 32kb; if you end up needing more than 32kb while still reading metadata you read the next 256kb. If you need more than that you read the next 2MB, and so on. This is still all at the AsyncFileReader layer.

I think none, unless #76 is landed

#76 only works on a single tag at a time, so I don't see how it solves the problem.

@kylebarron
Copy link
Member Author

PrefetchReader<(Metadata)CacheReader<ReqwestReader>>

With this PR, we can have a MetadataPrefetchReader and a ImageCacheReader, so that there's no double fetching or caching going on.

@kylebarron
Copy link
Member Author

Sure you can; you can have an exponential metadata reader. Start with 32kb; if you end up needing more than 32kb while still reading metadata you read the next 256kb. If you need more than that you read the next 2MB, and so on. This is still all at the AsyncFileReader layer.

Was chatting with @vincentsarago and apparently this is roughly the same approach that GDAL takes to read header metadata

@feefladder
Copy link
Contributor

feefladder commented Mar 25, 2025

unless we tell it how big tag data is

Who is we? The developer at compile time or the end user at runtime?

Dev at compile time

I think none, unless #76 is landed

#76 only works on a single tag at a time, so I don't see how it solves the problem.

Because of how a file is laid out in a COG and we're reading the full-res image first:

all calculations are in bytes, since that is what the middleware sees
- TileOffsets0 (Long8 in big, Long in small), largest  -\_ together tile_info_0
- TileByteCounts0 (Long) = 1/2 TileOffsets0 in BigTiff -/

- tile_info_1 ~= 1/4 tile_info_0
- tile_info_2 ~= 1/4 tile_info_1
- continue like that

sum to infinity = geometric series => tile_info_0/(1-1/4) = 4/3*tile_info_0
- bigtiff: tile_info_0 = 3/2*TileOffsets0 => total_bytes ~= 2*TileOffsets0
- smalltiff: tile_info_0 = 2*TileOffsests0 => total_bytes ~= 8/3*TileOffsets0

Since we're only really expecting to have large tile_info arrays in BigTiff, I went with the BigTiff assumption.

Sure you can; you can have an exponential metadata reader. Start with 32kb; if you end up needing more than 32kb while still reading metadata you read the next 256kb. If you need more than that you read the next 2MB, and so on. This is still all at the AsyncFileReader layer.

Was chatting with @vincentsarago and apparently this is roughly the same approach that GDAL takes to read header metadata

I think #74 does better than exponential prefetching in limiting the number of requests and not over-estimating if we're almost at the end of tile_info.

But in any case, my proposed changes can also not be made, and then the logic for the MetadataCacheReader does both the initial prefetch and the subsequent estimates.

@feefladder
Copy link
Contributor

feefladder commented Mar 25, 2025

aka LGTM, but please also #76 :)

@kylebarron kylebarron enabled auto-merge (squash) March 26, 2025 14:52
@kylebarron kylebarron merged commit 4f387e1 into main Mar 26, 2025
6 checks passed
@kylebarron kylebarron deleted the kyle/split-traits-get-image-metadata branch March 26, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split requests for metadata and image data in AsyncFileReader trait
2 participants