Skip to content

[video reader] inception commit #1303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Sep 20, 2019
Merged

Conversation

stephenyan1231
Copy link
Contributor

@stephenyan1231 stephenyan1231 commented Sep 6, 2019

Implement a C++ video decoder, and refer to it as TorchVision (TV) video reader in the following.

Attention

This PR replaces the original PR (#1279) which is contaminated by other irrelevant commits.

Main features

  • Decode both video frames and audio waveform in a single pass
  • Being able to seek to a user-specified timestamp in both video- and audio streams, and decode frames starting from there. Also can take an end timestamp where the decoding should stop.
  • For video decoding, support to rescale the height/width and specific AVPixelFormat (default: AV_PIX_FMT_RGB24)
  • For audio decoding, support to resample audio using user-specified sampling rate and channels. User can also specify AVSampleFormat (default: AV_SAMPLE_FMT_FLT)
  • Support to decode pts only while actual video/audio frame data is skipped. This is useful in the dataset initialization stage where an index of video dataset needs to be built and we only need pts information
  • Support to only video stream and ignore audio stream, and vice versa.
  • Other changes
    • add methods load_metadata() and save_metadata() to class VideoClips in video_utils.py

APIs

The main API includes

  • FfmpegDecoder::decodeFile(....): decode frames from a given video file. This is useful for both OOS and FB research projects, where videos reside in file folder.
  • FfmpegDecoder::decodeMemory(....): decode frames from a given compressed video byte array. This is useful for decoding everstore videos.

Sanity check

  • No memory leak is detected.

unit tests

  • test/test_video_reader.py
  • test/test_io.py

Changes to TorchVision installation

Video reader depends on ffmpeg4. To install it, use conda.

  • conda install -c conda-forge ffmpeg=4.0.2
  • You also also install py av via conda install -c conda-forge av which will automatically install ffmpeg dependency.

Benchmark

We use several videos from HMDB-51, UCF-101 and Kinetics-400 for benchmarking and unit test. Test videos are listed below.

  • RATRACE_wave_f_nm_np1_fr_goo_37.avi

    • source: hmdb51
    • video: DivX MPEG-4
      • fps: 30
    • audio: N/A
  • SchoolRulesHowTheyHelpUs_wave_f_nm_np1_ba_med_0.avi

    • source: hmdb51
    • video: DivX MPEG-4
      • fps: 30
    • audio: N/A
  • TrumanShow_wave_f_nm_np1_fr_med_26.avi

    • source: hmdb51
    • video: DivX MPEG-4
      • fps: 30
    • audio: N/A
  • v_SoccerJuggling_g23_c01.avi

    • source: ucf101
    • video: Xvid MPEG-4
      • fps: 29.97
    • audio: N/A
  • v_SoccerJuggling_g24_c01.avi

    • source: ucf101
    • video: Xvid MPEG-4
      • fps: 29.97
    • audio: N/A
  • R6llTwEh07w.mp4

    • source: kinetics-400
    • video: H-264 - MPEG-4 AVC (part 10) (avc1)
      • fps: 30
    • audio: MPEG AAC audio (mp4a)
      • sample rate: 44.1K Hz
  • SOX5yA1l24A.mp4

    • source: kinetics-400
    • video: H-264 - MPEG-4 AVC (part 10) (avc1)
      • fps: 29.97
    • audio: MPEG AAC audio (mp4a)
      • sample rate: 48K Hz
  • WUzgd7C1pWA.mp4

    • source: kinetics-400
    • video: H-264 - MPEG-4 AVC (part 10) (avc1)
      • fps: 29.97
    • audio: MPEG AAC audio (mp4a)
      • sample rate: 48K Hz

Unit test

  • we compare the decoding speed between TorchVision video reader and PyAv in the following cases
    • decode full video from file / memory
    • decode a fixed number of frames (e.g. [4, 8, 16, 32, 64, 128]) at a randomly selected timestamp
  • we test the feature of rescaling video frames and resampling audio waveforms
  • we did stress test to iteratively decode videos and ensure no memory leak
  • we compare decoding results between only pts is needed and both pts and video/audio frames are needed. Ensure the returned pts data are identical. Also compare decoding efficiency to validate decoding is faster when only pts is needed

Results of unit test are attached.

[torchvision video reader unit test.log]

torchvision.video.reader.unit.test.log

Comparison with PyAv

  • When decoding all video/audio frames in the video, TorchVision video reader is 1.2x - 6x faster depending on the codec and video length
  • When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), TorchVision video reader runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128])

@stephenyan1231
Copy link
Contributor Author

I start to continue the discussion in the original PR (#1279) below

@stephenyan1231
Copy link
Contributor Author

stephenyan1231 commented Sep 10, 2019

From @fmassa


Thanks a lot for the PR Zhicheng!

The first thing I need to figure out before we can merge this is how we will be adding ffmpeg as a dependency for torchvision, and if it will be a soft or hard dependency.

A few options:

use ffmpeg from conda-forge
pull the ffmpeg source and compile it together with torchvision
use the packages provided by ffmpeg
Also, what is the version of FFMpeg that we will be relying upon?

Another thing I need to do is to get CI working for Windows and OSX in torchvision, so that we can make sure that this PR compiles and works nicely in the other OS that torchvision supports.

I'll be looking into both the CI and ffmpeg dependency from an OSS perspective.


Option 1 will serve well.
Run conda install -c conda-forge ffmpeg=4.0.2 to install ffmpeg.
I test the video reader with ffmpeg 4.0.2, which is also the latest version of ffmpeg used by PyAv. TorchVision still has some dependency on PyAV now, and installing ffmpeg newer than 4.0.2 will cause conflict with PyAv. So I recommend to install ffmpeg 4.0.2.

In virtual env of Anaconda (say pytorch_py3_exp)

  • header files of ffmpeg will be installed at anaconda3/envs/pytorch_py3_exp/include
  • shared object files of ffmpeg will be installed at anaconda3/envs/pytorch_py3_exp/lib

We need to add anaconda3/envs/pytorch_py3_exp/include to the include_dirs of video reader cpp extension in setup.py. I have updated setup.py to do so.

@stephenyan1231
Copy link
Contributor Author

stephenyan1231 commented Sep 10, 2019

From @soumith


i think it might be a good start to start with (1), i.e. the ffmpeg from conda-source or system package manager (brew install ffmpeg / apt install ffmpeg). Also, by ffmpeg I presume you mean libav?

For binaries, we will figure out how to ship ffmpeg the right way ourselves. Just building ffmpeg from source is not sufficient btw, because you need to build it with codec support, and there are tons of codecs we need to build it with.


I agree building ffmpeg from source is not a smooth process (see lengthy instructions here: https://trac.ffmpeg.org/wiki/CompilationGuide/Centos).

On the other side, conda install -c conda-forge ffmpeg=4.0.2 will install header files and shared object files, which are all we need.

@fmassa
Copy link
Member

fmassa commented Sep 19, 2019

@stephenyan1231 I've fixed lint and made the video_reader extension to be optional.

Can you look into the final build failures that happen for some specific versions of gcc?

@codecov-io
Copy link

Codecov Report

Merging #1303 into master will decrease coverage by 0.42%.
The diff coverage is 28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1303      +/-   ##
==========================================
- Coverage   65.47%   65.04%   -0.43%     
==========================================
  Files          75       76       +1     
  Lines        5827     5902      +75     
  Branches      892      901       +9     
==========================================
+ Hits         3815     3839      +24     
- Misses       1742     1795      +53     
+ Partials      270      268       -2
Impacted Files Coverage Δ
torchvision/io/__init__.py 100% <100%> (ø) ⬆️
torchvision/io/_video_opt.py 27.02% <27.02%> (ø)
torchvision/transforms/transforms.py 80.98% <0%> (+0.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f677ea3...5ecbd6a. Read the comment docs.

@lucasjinreal
Copy link

Which ffmpeg version using?

 /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp: In member function ‘int FfmpegStream::openCodecContext()’:
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:31:23: error: ‘AVStream {aka struct AVStream}’ has no member named ‘codecpar’
       auto codec_id = st->codecpar->codec_id;
                           ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:48:59: error: ‘AVStream {aka struct AVStream}’ has no member named ‘codecpar’
       if ((ret = avcodec_parameters_to_context(codecCtx_, st->codecpar)) < 0) {
                                                               ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:48:67: error: ‘avcodec_parameters_to_context’ was not declared in this scope
       if ((ret = avcodec_parameters_to_context(codecCtx_, st->codecpar)) < 0) {
                                                                       ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp: In member function ‘int FfmpegStream::sendPacket(const AVPacket*)’:
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:146:47: error: ‘avcodec_send_packet’ was not declared in this scope
       return avcodec_send_packet(codecCtx_, packet);
                                                   ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp: In member function ‘int FfmpegStream::receiveFrame()’:
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:150:52: error: ‘avcodec_receive_frame’ was not declared in this scope
       int ret = avcodec_receive_frame(codecCtx_, frame_);
                                                        ^
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Got some error...

facebook-github-bot pushed a commit to facebookresearch/ClassyVision that referenced this pull request Oct 16, 2019
Summary:
Pull Request resolved: #62

Current dependency torchvision 0.4.0 was released in August.
It missed quite a few PRs that are merged after that, and that are needed for video classification, such as

- pytorch/vision#1437
- pytorch/vision#1431
- pytorch/vision#1423
- pytorch/vision#1418
- pytorch/vision#1408
- pytorch/vision#1376
- pytorch/vision#1363
- pytorch/vision#1353
- pytorch/vision#1303

This will fail the CI test when a diff uses changes made in those PRs.
Before a new official version of TorchVision is released, we can temporarily use the nightly torchvision to get all the recent PRs, and unblock the PR merging.
We plan to use a fixed version of TorchVision later.

Reviewed By: vreis

Differential Revision: D17944239

fbshipit-source-id: 86ff540e3fc4f08ef767e84ef103525db5158201
@fmassa fmassa mentioned this pull request Oct 31, 2019
fmassa pushed a commit that referenced this pull request Oct 31, 2019
* [video reader] inception commit

* add method save_metadata to class VideoClips in video_utils.py

* add load_metadata() method to VideoClips class

* add Exception to not catch unexpected events such as memory erros, interrupt

* fix bugs in video_plus.py

* [video reader]remove logging. update setup.py

* remove time measurement in test_video_reader.py

* Remove glog and try making ffmpeg finding more robust

* Add ffmpeg to conda build

* Add ffmpeg to conda build [again]

* Make library path finding more robust

* Missing import

* One more missing fix for import

* Py2 compatibility and change package to av to avoid version conflict with ffmpeg

* Fix for python2

* [video reader] support to decode one stream only (e.g. video/audio stream)

* remove argument _precomputed_metadata_filepath

* remove save_metadata method

* add get_metadata method

* expose _precomputed_metadata and frame_rate arguments in video dataset __init__ method

* remove ssize_t

* remove size_t to pass CI check on Windows

* add PyInit__video_reader function to pass CI check on Windows

* minor fix to define PyInit_video_reader symbol

* Make c++ video reader optional

* Temporarily revert changes to test_io

* Revert changes to python files

* Rename files to make it private

* Fix python lint

* Fix C++ lint

* add a functor object EnumClassHash to make Enum class instances usable as key type of std::unordered_map

* fix cpp format check

* Fix cherry-pick conflict for 0.4.2 release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants