Skip to content

[feature request] Allow passing in image reading/loading function to old-style datasets constructors #4991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vadimkantorov opened this issue Nov 25, 2021 · 14 comments

Comments

@vadimkantorov
Copy link

vadimkantorov commented Nov 25, 2021

🚀 The feature

This makes more sense now that torchvision is compiled with its own image reading functions (read_image), so easier way to test pipelines without PIL would be nice

In addition, this would allow easy stubbing out image loading when wanted and solving #4975

Motivation, pitch

N/A

Alternatives

No response

Additional context

No response

cc @pmeier

@vadimkantorov vadimkantorov changed the title [feature request] Allow passing in image loading function to old-style datasets constructor [feature request] Allow passing in image loading function to old-style datasets constructors Nov 25, 2021
@vadimkantorov vadimkantorov changed the title [feature request] Allow passing in image loading function to old-style datasets constructors [feature request] Allow passing in image reading/loading function to old-style datasets constructors Nov 25, 2021
@datumbox
Copy link
Contributor

@vadimkantorov Thanks for the proposal.

It's unlikely that we will add many new features on the old-style datasets but this is certainly good feedback for those in prototype. As far as I understand the new API has a mechanism (decoder) for using different readers, but it seems that now only PIL is supported.

@pmeier Is supporting TorchVision's io within your current plans?

@pmeier
Copy link
Collaborator

pmeier commented Nov 25, 2021

Yes, the new datasets will use the decoder parameter to do just that. You can pass any callable to datasets.load(..., decoder=) that takes an open file handle and returns a tensor. torchvision.io.read_image currently takes a path, so we are using PIL by default now.

There are still some design choices to be made what a decoder is and what it should return. For example, how do we handle the case if more than one type of file need to be decoded? Furthermore, how do we handle the case of multiple return values for example the audio and video tensors after decoding a video.

@vadimkantorov
Copy link
Author

The proposed change is simple, low risk and is immediately useful to hundreds of people working with existing dataset APIs, it will take some time until the new ones are adopted by the community.

@pmeier
Copy link
Collaborator

pmeier commented Dec 9, 2021

Unfortunately, I don't think the change is simple. The functionality you propose is implemented in the new-style datasets, but we already ran into troubles and I see more to come in the future. I've opened #5075 for a discussion of how we want to handle decoding in the future. While we will find a solution for them for the new-style datasets I'm not eager to also maintain the same functionality in the old-style ones.

From your request I get that you only want to disable it and not use custom decoding. If don't care about the actual data and only use it for testing, can't you simply patch it out?

from unittest import mock

from torchvision import datasets

dataset = datasets.VOCDetection(...)

print(dataset[0][0])

with mock.patch("torchvision.datasets.voc.Image.open"):
    print(dataset[0][0])
<PIL.Image.Image image mode=RGB size=500x442 at 0x7F705C1E8A90>
<MagicMock name='open().convert()' id='140119809177920'>

With the patch, iterating over the complete dataset takes ~1.5 seconds on my machine.

@vadimkantorov
Copy link
Author

Yes, I have ways around it of course as mentioned above, but it's cumbersome.

What kind of troubles could it be if one just accepts a lambda that reads an image by path? or at least makes it semi-standard private methods that are elementary to patch?

@pmeier
Copy link
Collaborator

pmeier commented Dec 9, 2021

As detailed in #5075:

  1. The current signature assumes that there is only one type of payload to decode in a dataset, i.e. images or videos. Other types, for example annotation files stored as .mat, .xml, or .flo, will always be decoded. Thus, the user can't completely deactivate the decoding after all. Furthermore, they can also not use any custom decoding for these types if need be.
  2. The current signature assumes that all payloads of a single type can be decoded by the same decoder. Counter examples to this are the HD1K optical flow datasets that uses 16bit .png images as annotations which have sub-par support by Pillow.

@vadimkantorov
Copy link
Author

vadimkantorov commented Dec 9, 2021

About (1), in context of old-style dataset, one can start with accepting image_loader in constructor or using _image_loader function, this already speeds up a lot the scenario of iterating/indexing a dataset without image loading/decoding. If wanted, more loading functions could be moved to private functions, such as _load_xml or _load_flo - this is relatively simple to patch out as well if not wanting to accept them to constructor (although this is also possible just fine)

Aobut (2): old-style dataset may have a suitable default loading function if None is passed or if _image_loader isn't redefined, there's no assumption that the same decoder would be used for all datasets. If it works now, it would work with moving the image loading logic to _load_image function just fine

@pmeier
Copy link
Collaborator

pmeier commented Dec 9, 2021

this already speeds up a lot the scenario of iterating/indexing a dataset without image loading/decoding.

Our datasets are map-style datasets. Thus, they are fully indexed as soon as their are instantiated. What kind of information do you want to extract that is not available at that stage?

old-style dataset may have a suitable default loading function if None is passed or if _image_loader isn't redefined, there's no assumption that the same decoder would be used for all datasets. If it works now, it would work with moving the image loading logic to _load_image function just fine

My point is that this solution will not work for all datasets as detailed in the other issue. I don't want one-off solutions for a problem that already has a workaround will be solved in a more general way in the future.

I'm ok with a solution like you proposed in #4975, because it doesn't increase the API surface. Still, this new "feature" won't be available immediately. The next release is planned for early March. This is also the time we plan on moving the new-style datasets out of a prototype state.

@vadimkantorov
Copy link
Author

I don't want one-off solutions for a problem that already has a workaround will be solved in a more general way in the future.

The vast majority of code and packages and projects in development use old-style datasets. It's your call of course, but adding such small features, even if not "solving the data loading" is useful and helpful for a lot of people.

@pmeier
Copy link
Collaborator

pmeier commented Dec 9, 2021

What exactly is the use case? As stated above, the datasets are already indexed after they are instantiated. What kind of data do you want to extract?

@vadimkantorov
Copy link
Author

E.g. I want to access only instance annotations, without spending time on loading and decoding the image, e.g. I need an indexable dataset instance that's fast. Currently I hack around, but it would be much better if the datasets were more amenable to this scenario (e.g. accepter image loader as argument in constructor or used a private _load_image methods that are not hard to patch out)

@pmeier
Copy link
Collaborator

pmeier commented Dec 10, 2021

I want to access only instance annotations, without spending time on loading and decoding the image, e.g. I need an indexable dataset instance that's fast.

If you regularly do this, wouldn't it a lot better to simply store the annotations in an indexable way and simply compute everything you need based on that? That would also save you instantiating the dataset and decoding the annotations:

import pickle
from unittest import mock

from torchvision import datasets

dataset = datasets.VOCDetection(...)
ann_file = "voc_annotations.pt"

with mock.patch("torchvision.datasets.voc.Image.open"):
    with open(ann_file, "wb") as file:
        pickle.dump(dict(enumerate([ann["annotation"] for _, ann in dataset])), file)

with open(ann_file, "rb") as file:
    anns = pickle.load(file)

anns[3141]

Currently I hack around, but it would be much better if the datasets were more amenable to this scenario

True, and this is why the new-style datasets will solve this.

(e.g. accepter image loader as argument in constructor or used a private _load_image methods that are not hard to patch out)

If you are ok with patching a private method at runtime, I fail to see why my workaround from #4991 (comment) is too "cumbersome".

@vadimkantorov
Copy link
Author

vadimkantorov commented Dec 10, 2021

If you are ok with patching a private method at runtime, I fail to see why my workaround from #4991 (comment) is too "cumbersome".

Introducing a whole new mock concept just to disable image loading is overkill IMO instead of providing a super-simple extension point (method override/patch). It also requires knowing of this mock framework to do this simple task. Of course, it's a matter of personal preferences. If that's the recommended way, it would be best to have it in the docs, as the usecase is frequent when writing evaluation code relying on existing annotation reading / transforms code in the dataset. Does mock apply if we just create an object in the with block and then use object's indexing outside of the with block?

True, and this is why the new-style datasets will solve this.

All existing code and projects / examples / tutorials / university lab exercies use old datasets and will continue to do so for a long time. New projects will probably use benefit from the new functionality, but a lot of users will be continuing to use old datasets for quite some time.

ann_file = "voc_annotations.pt"

Caching around always comes at cost of keeping track of whether the cached file was created with the right hyper-params (e.g. of transforms that may change) and can't save lambdas because of pickle issues.

The coco dataset already has such a _load_image method, it would be nice if other datasets adopted this. This will solve the usecase.

@pmeier
Copy link
Collaborator

pmeier commented Dec 10, 2021

Does mock apply if we just create an object in the with block and then use object's indexing outside of the with block?

The mock stays in place until you exit the context manager. Afterwards you'll have the normal behavior again. You can also start and stop it without a context manager:

patcher = mock.patch(...)
patcher.start()
....
patcher.stop()

All existing code and projects / examples / tutorials / university lab exercies use old datasets and will continue to do so for a long time. New projects will probably use benefit from the new functionality, but a lot of users will be continuing to use old datasets for quite some time.

Current plan is to remove the old datasets as soon as the new ones are stable plus one deprecation release. Thus, if users want to use the latest releases, they will have to move eventually.

The coco dataset already has such a _load_image method, it would be nice if other datasets adopted this. This will solve the usecase.

If you want to have such a method, I'm not going to oppose it if is a 1-to-1 copy of the current behavior. I don't think there will be any changes to the old datasets that would break this in the future, but since it is a private method, I'm not going to guarantee it. Given that this feature is already requested in #4975, I'm going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants