[RFC] How should datasets handle decoding of files? #5075

pmeier · 2021-12-09T09:20:02Z

It is a common feature request (for example #4991) to be able to disable the decoding when loading a dataset. To solve this we added a decoder keyword argument to the load mechanism (torchvision.prototype.datasets.load(..., decoder=...)). It takes an Optional[Callable] with the following signature:

def my_decoder(buffer: BinaryIO) -> torch.Tensor: ...

If it is a callable, it will be passed a buffer from the dataset and the result will be integrated into the sample dictionary. If the decoder is None instead, the buffer will be integrated in the sample dictionary instead, leaving it to the user to decode.

vision/torchvision/prototype/datasets/_builtin/imagenet.py

Lines 132 to 134 in 4cacf5a

    
           return dict( 
        
               path=path, 
        
               image=decoder(buffer) if decoder else buffer,

This works well for images, but already breaks down for videos as discovered in #4838. The issue is that decoding a video results in more information than a single tensor. The tentative plan in #4838 was to change the signature to

def my_decoder(buffer: BinaryIO) -> Dict[str, Any]: ...

With this, a decoder can now return arbitrary information, which can be integrated in the top level of the sample dictionary.

Unfortunately, looking ahead, I don't think even this architecture will be sufficient. Two issues came to mind:

The current signature assumes that there is only one type of payload to decode in a dataset, i.e. images or videos. Other types, for example annotation files stored as .mat, .xml, or .flo, will always be decoded. Thus, the user can't completely deactivate the decoding after all. Furthermore, they can also not use any custom decoding for these types if need be.
The current signature assumes that all payloads of a single type can be decoded by the same decoder. Counter examples to this are the HD1K optical flow datasets that uses 16bit .png images as annotations which have sub-par support by Pillow.

To overcome this, I propose a new architecture that is similar to the RoutedDecoder datapipe. We should have a Decoder class that has a sequence of Handler's (name up for discussion):

class Decoder:
    def __init__(
        self,
        *handlers: Callable[[str, BinaryIO], Optional[str, Any]],
        must_decode: bool = True,
    ):
        self.handlers = handlers
        self.must_decode = must_decode

    def __call__(
        self,
        path: str,
        buffer: BinaryIO,
        *,
        prefix: str = "",
        include_path: bool = True,
    ) -> Dict[str, Any]:
        for handler in self.handlers:
            output = handler(path, buffer)
            if output is not None:
                break
        else:
            if self.must_decode:
                raise RuntimeError(
                    f"No handler was responsible for decoding the file {path}."
                )
            output = {(f"{prefix}_" if prefix else "") + "buffer": buffer}

        if include_path:
            output[(f"{prefix}_" if prefix else "") + "path"] = path

        return output

If called with a path-buffer-pair the decoder iterates through the registered handlers and returns the first valid output. Thus, each handler can determine based on the path if it is responsible for decoding the current buffer. By default, the decoder will bail if no handler decoded the input. This can be relaxed by the must_decode=False flag (name up for discussion), which is a convenient way to have a non-decoder.

We would need to change datasets.load function to

def load(
    ...,
    decoder: Optional[
        Union[
            Decoder,
            Callable[[str, BinaryIO], Optional[str, Any]],
            Sequence[Callable[[str, BinaryIO], Optional[str, Any]]],
        ]
    ] = ()
):
    ...
    if decoder is None:
        decoder = Decoder(must_decode=False)
    elif not isinstance(decoder, Decoder):
        decoder = Decoder(
            *decoder if isinstance(decoder, collections.abc.Sequence) else decoder,
            *dataset.info.handlers,
            *default_handlers,
        )
    ...

By default the user would get the dataset specific handlers as well as the default ones. By supplying custom ones, they would be processed with a higher priority and thus overwriting the default behavior if needs be. If None is passed we get a true non-encoder. Finally, by passing a Decoder instance the user has full control over the behavior.

Within the dataset definition, the call to the decoder would simply look like

path, buffer = data

sample = dict(...)
sample.update(decoder(path, buffer))

or, if multiple buffers need to be decoded,

image_data, ann_data = data
image_path, image_buffer = image_data
ann_path, ann_buffer = ann_data

sample = dict()
sample.update(decoder(image_path, image_buffer, prefix="image"))
sample.update(decoder(ann_path, ann_buffer, prefix="ann"))

cc @pmeier @bjuncek

The text was updated successfully, but these errors were encountered:

fmassa · 2021-12-14T09:43:37Z

Controversial thought: remove decoder from the datasets, and let the user handle the decoding themselves.

We can have a custom type that represents RawImage, RawVideo or something like that.

The user can then select whatever type of decoder they want and pass it in a .map call.

pmeier · 2021-12-14T12:58:58Z

I think this is a good approach as long as we can make it work as seamless as what I proposed above. I currently see one issue:

If we separate the decoder from the dataset, how do we use dataset specific handlers (keeping the same terminology as in my comment above) in a non-verbose way?

With my proposal, this can be handled internally and so the dataset construction looks like this:

from torchvision.prototype import datasets

dataset = datasets.load("foo")

With your proposal, we first need to load the static info of the dataset and later construct a Decoder instance that uses the dataset specific handlers:

from torchvision.prototype import datasets

name = "foo"
info = datasets.info(name)
dataset = datasets.load(name)
dataset = dataset.map(
    datasets.decoder.Decoder(
        *info.handlers,
        *datasets.decoder.default_handlers(),
    )
)

pmeier · 2021-12-15T12:43:22Z

One possible solution that came to mind, is to attach the default decoder to RawImage or other custom types. That way we could have a Decoder datapipe that we could use like this:

from torchvision.prototype import datasets

dataset = datasets.load("foo")
dataset = datasets.utils.Decoder(dataset)

Optionally we could also use the .decode() functional call, which is currently occupied by the RoutedDecoder. IIRC, this will be removed in the future @VitalyFedyunin @ejguan? This way the call would look even easier:

from torchvision.prototype import datasets

dataset = datasets.load("foo").decode()

ejguan · 2021-12-15T15:50:31Z

I have a question regarding video file.
Is there any use case that a single video file (handle) would be decoded several times to achieve getting part of frames from the video? Like decode a sliding window over videos.
Or, we are going to decode the whole video and then do the sliding window on the decoded video frames.

One possible solution that came to mind, is to attach the default decoder to RawImage or other custom types. That way we could have a Decoder datapipe that we could use like this:

Do you want to support custom decoder for the dataset?

Optionally we could also use the .decode() functional call, which is currently occupied by the RoutedDecoder.

IIUC the datasets is not a DataPipe instance, you can use decode for free.

pmeier · 2021-12-15T18:01:33Z

I have a question regarding video file.
Is there any use case that a single video file (handle) would be decoded several times to achieve getting part of frames from the video? Like decode a sliding window over videos.
Or, we are going to decode the whole video and then do the sliding window on the decoded video frames.

cc @bjuncek

Do you want to support custom decoder for the dataset?

We need to. Some datasets use either non-standard files or formats with sub-par support by the default decoders. See 1. and 2. in my top post.

IIUC the datasets is not a DataPipe instance, you can use decode for free.

torchvision.datasets.load() gives you IterDataPipe[Dict[str, Any]] so we are bound by the names that are already taken.

ejguan · 2021-12-15T18:05:58Z

Let us discuss it over the team meeting then. I think we can release it to you. Let domains to handle corresponding decoders makes more sense to me.

pmeier · 2021-12-22T15:34:42Z

After some more discussion, we realized there is another requirement: even without decoding, the sample dictionary should be serializable. This eliminates the possibility of using custom file wrappers as originally thought up in #5075 (comment).

Our current idea is to always read each file and store the encoded bytes in an uint8 tensor. This has two advantages:

Whatever method we later use to serialize needs to handle tensors anyway, so we don't need to worry about the encoded files.
If we in the future we have scriptable decoding transform, we could have end-to-end scriptability.

The only downside we are currently seeing is that we lose the ability to not load the data at all. Given that the time to read the bytes is usually dwarfed by the decoding time, we feel this is a good compromise.

You can find a proof-of-concept implementation in #5105.

pmeier added enhancement needs discussion module: datasets prototype labels Dec 9, 2021

pmeier mentioned this issue Dec 9, 2021

[feature request] Allow passing in image reading/loading function to old-style datasets constructors #4991

Closed

This was referenced Dec 16, 2021

[PoC] separate decoding from datasets #5105

Closed

[WIP] UCF101 prototype with utilities for video loading #4838

Open

pmeier mentioned this issue Jan 26, 2022

revamp prototype features #5283

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] How should datasets handle decoding of files? #5075

[RFC] How should datasets handle decoding of files? #5075

pmeier commented Dec 9, 2021 •

edited

Loading

fmassa commented Dec 14, 2021

pmeier commented Dec 14, 2021

pmeier commented Dec 15, 2021

ejguan commented Dec 15, 2021

pmeier commented Dec 15, 2021

ejguan commented Dec 15, 2021

pmeier commented Dec 22, 2021 •

edited

Loading

[RFC] How should datasets handle decoding of files? #5075

[RFC] How should datasets handle decoding of files? #5075

Comments

pmeier commented Dec 9, 2021 • edited Loading

fmassa commented Dec 14, 2021

pmeier commented Dec 14, 2021

pmeier commented Dec 15, 2021

ejguan commented Dec 15, 2021

pmeier commented Dec 15, 2021

ejguan commented Dec 15, 2021

pmeier commented Dec 22, 2021 • edited Loading

pmeier commented Dec 9, 2021 •

edited

Loading

pmeier commented Dec 22, 2021 •

edited

Loading