-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[RFC] Hardware-accelerated video decoding #2439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, Thanks for opening the issue! I agree that video decoding on the GPU would be a nice functionality to have, although it's not in the near-term plan for now (maybe in 6 months?). About seeking on videos, we are preparing a revamp of the video reading abstractions which will be more generic and allow for more flexibility while performing video decoding. cc @bjuncek For video transformations on the GPU, note that we are making all transforms in torchvision to be torchscriptable and work with Tensors, so that they will natively support GPU if needed. |
Updated description after the release of torchscriptable transforms. |
We may first
|
Hi, The only drawback against NVIDIA VPF is numpy arrays are allocated at CPU in the end. However it allows to use all the ffmpeg tools. thus it's way more flexible. I think something like this can be adapted to the current video reader that torchvision have. |
Pinging this thread to check for interest. Nvidia has been working on writing their own bridge between their GPU's encoders/decoders and popular machine learning libraries (link here), but it's still in beta. Is there any interest in making this a feature in PyTorch? I could see something like this introducing some unintuitive edge cases, since an iterator that returns CUDA Tensors may not play well with DataLoaders which leverage multiprocessing. However, the goal of this feature addition would be to offer developers the option to leverage the Nvidia GPU decoders without installing another dependency. However, if the integration can be done from FFmpeg's C++ API, then it could probably be easily expanded to include Intel's and AMD decoder APIs as well, assuming they're all available form the same API. |
@dwrodri From my point of view it would be a really interesting tool. Working with videos and cpu is horrible and I just found maaaany issues. DALI works like a charm but it's not very flexible. |
Most Python libraries I've seen that interface with the decoder onboard an Nvidia GPU pawn it off to FFmpeg's CLI using libAV, the library form of FFmpeg, offers a wrapper around the propprietary hardware accelerators they support. If I can find the time, I'd like to see if I can put together a PR which would allow someone to pass a DALI doesn't support variable-frame-rate video, which is unfortunate for me because I work with video containing missing frames quite often. |
I agree that it would be nice but it's not straight forward. The support varies a lot across different GPU models. In short, if the user want to use GPU decoding it's really likely will have to prepare the videos beforehand. I find much simpler to pass a flag and let the user to fall in the codec exception if the format is wrong. |
Let me just add that DALI's fixed-frame-rate support is not that bad. |
If I understand correctly, the main drawback of GPU decoding are the lack of supported codecs? i.e. one would most likely need to prepare videos specifically for that? @JuanFMontesinos have you used DALI extensively for things like audio/video trainings? I'm interested if there are some weird/unexpected failure cases there, specifically with multiple modalities (streams) - in the past NVCODEC had some major issues with that, and I've been out of touch with it for the last year or so. |
Hi @bjuncek So yes, the main drawback is nvidia only supports h264 (and h265 encoding from 30XX gen onwards). Soo In my experience DALI works really well as it optimizes resizing/cropping ops at the time of decoding the data (I'm not an expert, just metioning what i read in their docs) The user doesn't have access to the streams, thus, I cannot make comments about it. Soo IMO it would be really awesome to adapt this to pytorch natively. A possible drawback I find is about pytorch's dataloader, which is python-coded and uses multiprocessing. I barely know that this would be problematic together with tensors allocated on gpu. So this may force to have a c++ decoder? |
(copy from #4392) I'm back after some time and have been doing some benchmarking: < insert funny gif here > In straight-up video reading, it is not obviously clear that GPU decoding is actually faster than CPU decoding [1]. In chat with some people that have tried it in their training pipelines, it seems like there is benefit of GPU decoding in end-to-end pipelines where decoded frames can be directly manipulated (transforms) and consumed (model) by the GPU. Note that Mike from pyAV had similar thoughts and reasoning for not supporting HWAD in their docs [2] [1] note, these comparisons were done on the machine with above-average CPU and what used to be quite competitive GPU. Running it on different hardware would probably provide more results. [2] https://pyav.org/docs/develop/overview/about.html : see section "Unsuported features" |
Thanks for sharing this info! This is good data that I'll incorporate into future discussions. You've already somewhat touched on the main case for PyTorch, so I'll elaborate a little on it here and hopefully pose some questions that which will hopefully move us closer to deciding the correct approach. You've already referenced the main case to be made in favor of GPU-side video decode: GPU-side preprocessing pipelines. The performance gains of transcoding video footage on a GPU are namely limited by overheads in communication over the PCI-E bus. However, the intended goal of adding this feature to Pytorch is to enable users to construct high throughput preprocessing pipelines by handling CudaTensors provided straight from an iterable. DALI shows great promise, and the devs are always responsive on their issue page. That being said, they are limited by the fact that the decoding process is tightly coupled to the parameters of the preprocessing pipeline, making it difficult to support variable frame rate footage. Furthermore, GPU-sided decode uses a minimal amount of GPU resources while freeing up the CPU from performing the task, which is quite resource intensive. So question follows: Where does one draw the line for performance expectactions Pytorch's Python API? If you're going to the point where things like the GIL are bottlenecking inference, you can see a fantastic writeup here showing significant inference speedup when video decoding is moved from the CPU to the GPU (along with other things). There's strong case to made that if you're deploying computer vision models, Python isn't the best choice for performance. I haven't looked at Decord in a while, and it appears that they've put a lot of work into their wrapper around the video encode/decode hardware on Nvidia GPUs. I'll have to check it out. Thanks again for sharing the useful info! |
@bryandeng We added GPU video decoder recently. Detailed installation instructions can be found here. It would be very helpful if you gave it a try and report any feedback you may have. |
@prabhat00155 Thanks a lot! I will have a try. |
Hi, I just remembered there is an Nvidia Toolkit Just pasting the description as I didn't try it:
Best |
Uh oh!
There was an error while loading. Please reload this page.
🚀 Feature
Hardware-accelerated video decoding
Motivation
Now that torchscriptable transforms natively supporting GPU have landed, hardware-accelerated video decoding may further help relieve the IO bottleneck commonly seen in large-scale video deep learning tasks.
Pitch
This functionality is likely to be built upon FFmpeg's hardware acceleration APIs, since FFmpeg is already in use and it's easier to support multiple hardware platforms and platform APIs in this way.
Alternatives
Decord and NVIDIA VPF are both PyTorch-friendly video IO libraries which support (NVIDIA only) hardware-accelerated video decoding to some extent.
NVIDIA VPF is built upon NVIDIA Video Codec SDK directly without FFmpeg.
Additional context
https://trac.ffmpeg.org/wiki/HWAccelIntro
https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new
The text was updated successfully, but these errors were encountered: