[video reader] inception commit #1303

stephenyan1231 · 2019-09-06T18:43:53Z

Implement a C++ video decoder, and refer to it as TorchVision (TV) video reader in the following.

Attention

This PR replaces the original PR (#1279) which is contaminated by other irrelevant commits.

Main features

Decode both video frames and audio waveform in a single pass
Being able to seek to a user-specified timestamp in both video- and audio streams, and decode frames starting from there. Also can take an end timestamp where the decoding should stop.
For video decoding, support to rescale the height/width and specific AVPixelFormat (default: AV_PIX_FMT_RGB24)
For audio decoding, support to resample audio using user-specified sampling rate and channels. User can also specify AVSampleFormat (default: AV_SAMPLE_FMT_FLT)
Support to decode pts only while actual video/audio frame data is skipped. This is useful in the dataset initialization stage where an index of video dataset needs to be built and we only need pts information
Support to only video stream and ignore audio stream, and vice versa.
Other changes
- add methods load_metadata() and save_metadata() to class VideoClips in video_utils.py

APIs

The main API includes

FfmpegDecoder::decodeFile(....): decode frames from a given video file. This is useful for both OOS and FB research projects, where videos reside in file folder.
FfmpegDecoder::decodeMemory(....): decode frames from a given compressed video byte array. This is useful for decoding everstore videos.

Sanity check

No memory leak is detected.

unit tests

test/test_video_reader.py
test/test_io.py

Changes to TorchVision installation

Video reader depends on ffmpeg4. To install it, use conda.

conda install -c conda-forge ffmpeg=4.0.2
You also also install py av via conda install -c conda-forge av which will automatically install ffmpeg dependency.

Benchmark

We use several videos from HMDB-51, UCF-101 and Kinetics-400 for benchmarking and unit test. Test videos are listed below.

RATRACE_wave_f_nm_np1_fr_goo_37.avi
- source: hmdb51
- video: DivX MPEG-4
  - fps: 30
- audio: N/A
SchoolRulesHowTheyHelpUs_wave_f_nm_np1_ba_med_0.avi
- source: hmdb51
- video: DivX MPEG-4
  - fps: 30
- audio: N/A
TrumanShow_wave_f_nm_np1_fr_med_26.avi
- source: hmdb51
- video: DivX MPEG-4
  - fps: 30
- audio: N/A
v_SoccerJuggling_g23_c01.avi
- source: ucf101
- video: Xvid MPEG-4
  - fps: 29.97
- audio: N/A
v_SoccerJuggling_g24_c01.avi
- source: ucf101
- video: Xvid MPEG-4
  - fps: 29.97
- audio: N/A
R6llTwEh07w.mp4
- source: kinetics-400
- video: H-264 - MPEG-4 AVC (part 10) (avc1)
  - fps: 30
- audio: MPEG AAC audio (mp4a)
  - sample rate: 44.1K Hz
SOX5yA1l24A.mp4
- source: kinetics-400
- video: H-264 - MPEG-4 AVC (part 10) (avc1)
  - fps: 29.97
- audio: MPEG AAC audio (mp4a)
  - sample rate: 48K Hz
WUzgd7C1pWA.mp4
- source: kinetics-400
- video: H-264 - MPEG-4 AVC (part 10) (avc1)
  - fps: 29.97
- audio: MPEG AAC audio (mp4a)
  - sample rate: 48K Hz

Unit test

we compare the decoding speed between TorchVision video reader and PyAv in the following cases
- decode full video from file / memory
- decode a fixed number of frames (e.g. [4, 8, 16, 32, 64, 128]) at a randomly selected timestamp
we test the feature of rescaling video frames and resampling audio waveforms
we did stress test to iteratively decode videos and ensure no memory leak
we compare decoding results between only pts is needed and both pts and video/audio frames are needed. Ensure the returned pts data are identical. Also compare decoding efficiency to validate decoding is faster when only pts is needed

Results of unit test are attached.

[torchvision video reader unit test.log]

torchvision.video.reader.unit.test.log

Comparison with PyAv

When decoding all video/audio frames in the video, TorchVision video reader is 1.2x - 6x faster depending on the codec and video length
When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), TorchVision video reader runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128])

stephenyan1231 · 2019-09-10T03:03:27Z

I start to continue the discussion in the original PR (#1279) below

stephenyan1231 · 2019-09-10T03:11:42Z

From @fmassa

Thanks a lot for the PR Zhicheng!

The first thing I need to figure out before we can merge this is how we will be adding ffmpeg as a dependency for torchvision, and if it will be a soft or hard dependency.

A few options:

use ffmpeg from conda-forge
pull the ffmpeg source and compile it together with torchvision
use the packages provided by ffmpeg
Also, what is the version of FFMpeg that we will be relying upon?

Another thing I need to do is to get CI working for Windows and OSX in torchvision, so that we can make sure that this PR compiles and works nicely in the other OS that torchvision supports.

I'll be looking into both the CI and ffmpeg dependency from an OSS perspective.

Option 1 will serve well.
Run conda install -c conda-forge ffmpeg=4.0.2 to install ffmpeg.
I test the video reader with ffmpeg 4.0.2, which is also the latest version of ffmpeg used by PyAv. TorchVision still has some dependency on PyAV now, and installing ffmpeg newer than 4.0.2 will cause conflict with PyAv. So I recommend to install ffmpeg 4.0.2.

In virtual env of Anaconda (say pytorch_py3_exp)

header files of ffmpeg will be installed at anaconda3/envs/pytorch_py3_exp/include
shared object files of ffmpeg will be installed at anaconda3/envs/pytorch_py3_exp/lib

We need to add anaconda3/envs/pytorch_py3_exp/include to the include_dirs of video reader cpp extension in setup.py. I have updated setup.py to do so.

stephenyan1231 · 2019-09-10T03:15:18Z

From @soumith

i think it might be a good start to start with (1), i.e. the ffmpeg from conda-source or system package manager (brew install ffmpeg / apt install ffmpeg). Also, by ffmpeg I presume you mean libav?

For binaries, we will figure out how to ship ffmpeg the right way ourselves. Just building ffmpeg from source is not sufficient btw, because you need to build it with codec support, and there are tons of codecs we need to build it with.

I agree building ffmpeg from source is not a smooth process (see lengthy instructions here: https://trac.ffmpeg.org/wiki/CompilationGuide/Centos).

On the other side, conda install -c conda-forge ffmpeg=4.0.2 will install header files and shared object files, which are all we need.

torchvision/csrc/cpu/video_reader/VideoReader.cpp

…terrupt

…with ffmpeg

…ream)

…t __init__ method

fmassa · 2019-09-19T14:09:17Z

@stephenyan1231 I've fixed lint and made the video_reader extension to be optional.

Can you look into the final build failures that happen for some specific versions of gcc?

…e as key type of std::unordered_map

codecov-io · 2019-09-19T18:25:04Z

Codecov Report

Merging #1303 into master will decrease coverage by 0.42%.
The diff coverage is 28%.

@@            Coverage Diff             @@
##           master    #1303      +/-   ##
==========================================
- Coverage   65.47%   65.04%   -0.43%     
==========================================
  Files          75       76       +1     
  Lines        5827     5902      +75     
  Branches      892      901       +9     
==========================================
+ Hits         3815     3839      +24     
- Misses       1742     1795      +53     
+ Partials      270      268       -2

Impacted Files	Coverage Δ
torchvision/io/__init__.py	`100% <100%> (ø)`	⬆️
torchvision/io/_video_opt.py	`27.02% <27.02%> (ø)`
torchvision/transforms/transforms.py	`80.98% <0%> (+0.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f677ea3...5ecbd6a. Read the comment docs.

lucasjinreal · 2019-10-12T09:24:52Z

Which ffmpeg version using?

 /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp: In member function ‘int FfmpegStream::openCodecContext()’:
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:31:23: error: ‘AVStream {aka struct AVStream}’ has no member named ‘codecpar’
       auto codec_id = st->codecpar->codec_id;
                           ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:48:59: error: ‘AVStream {aka struct AVStream}’ has no member named ‘codecpar’
       if ((ret = avcodec_parameters_to_context(codecCtx_, st->codecpar)) < 0) {
                                                               ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:48:67: error: ‘avcodec_parameters_to_context’ was not declared in this scope
       if ((ret = avcodec_parameters_to_context(codecCtx_, st->codecpar)) < 0) {
                                                                       ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp: In member function ‘int FfmpegStream::sendPacket(const AVPacket*)’:
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:146:47: error: ‘avcodec_send_packet’ was not declared in this scope
       return avcodec_send_packet(codecCtx_, packet);
                                                   ^
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp: In member function ‘int FfmpegStream::receiveFrame()’:
    /tmp/pip-fq1441jp-build/torchvision/csrc/cpu/video_reader/FfmpegStream.cpp:150:52: error: ‘avcodec_receive_frame’ was not declared in this scope
       int ret = avcodec_receive_frame(codecCtx_, frame_);
                                                        ^
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Got some error...

Summary: Pull Request resolved: #62 Current dependency torchvision 0.4.0 was released in August. It missed quite a few PRs that are merged after that, and that are needed for video classification, such as - pytorch/vision#1437 - pytorch/vision#1431 - pytorch/vision#1423 - pytorch/vision#1418 - pytorch/vision#1408 - pytorch/vision#1376 - pytorch/vision#1363 - pytorch/vision#1353 - pytorch/vision#1303 This will fail the CI test when a diff uses changes made in those PRs. Before a new official version of TorchVision is released, we can temporarily use the nightly torchvision to get all the recent PRs, and unblock the PR merging. We plan to use a fixed version of TorchVision later. Reviewed By: vreis Differential Revision: D17944239 fbshipit-source-id: 86ff540e3fc4f08ef767e84ef103525db5158201

* [video reader] inception commit * add method save_metadata to class VideoClips in video_utils.py * add load_metadata() method to VideoClips class * add Exception to not catch unexpected events such as memory erros, interrupt * fix bugs in video_plus.py * [video reader]remove logging. update setup.py * remove time measurement in test_video_reader.py * Remove glog and try making ffmpeg finding more robust * Add ffmpeg to conda build * Add ffmpeg to conda build [again] * Make library path finding more robust * Missing import * One more missing fix for import * Py2 compatibility and change package to av to avoid version conflict with ffmpeg * Fix for python2 * [video reader] support to decode one stream only (e.g. video/audio stream) * remove argument _precomputed_metadata_filepath * remove save_metadata method * add get_metadata method * expose _precomputed_metadata and frame_rate arguments in video dataset __init__ method * remove ssize_t * remove size_t to pass CI check on Windows * add PyInit__video_reader function to pass CI check on Windows * minor fix to define PyInit_video_reader symbol * Make c++ video reader optional * Temporarily revert changes to test_io * Revert changes to python files * Rename files to make it private * Fix python lint * Fix C++ lint * add a functor object EnumClassHash to make Enum class instances usable as key type of std::unordered_map * fix cpp format check * Fix cherry-pick conflict for 0.4.2 release

This was referenced Sep 6, 2019

[torchvision video reader]inception commit #1279

Closed

[video dataset]expose more arguments of VideoClips in video datasets #1310

Closed

fmassa mentioned this pull request Sep 10, 2019

modified code of io.read_video to interpret start_pts and end_pts in seconds #1313

Closed

fmassa force-pushed the video_reader branch from a087905 to ade1cf7 Compare September 13, 2019 19:25

fmassa self-requested a review September 18, 2019 17:28

fmassa reviewed Sep 18, 2019

View reviewed changes

torchvision/csrc/cpu/video_reader/VideoReader.cpp Show resolved Hide resolved

fmassa reviewed Sep 18, 2019

View reviewed changes

torchvision/csrc/cpu/video_reader/VideoReader.cpp Outdated Show resolved Hide resolved

fmassa force-pushed the video_reader branch from f989b7d to 2f6ede3 Compare September 18, 2019 19:51

zyan3 and others added 19 commits September 18, 2019 18:27

[video reader] inception commit

d4f6687

add method save_metadata to class VideoClips in video_utils.py

e8b2aed

add load_metadata() method to VideoClips class

5da3c20

add Exception to not catch unexpected events such as memory erros, in…

c067796

…terrupt

fix bugs in video_plus.py

97ece7f

[video reader]remove logging. update setup.py

3e3afbe

remove time measurement in test_video_reader.py

86e444d

Remove glog and try making ffmpeg finding more robust

f94d747

Add ffmpeg to conda build

bdd2c5a

Add ffmpeg to conda build [again]

3747d08

Make library path finding more robust

3664075

Missing import

45d96d2

One more missing fix for import

eeeac12

Py2 compatibility and change package to av to avoid version conflict …

6d71938

…with ffmpeg

Fix for python2

1063a72

[video reader] support to decode one stream only (e.g. video/audio st…

ecbbea5

…ream)

remove argument _precomputed_metadata_filepath

8877673

remove save_metadata method

3a147b8

add get_metadata method

8e6c5dc

zyan3 and others added 9 commits September 18, 2019 18:27

expose _precomputed_metadata and frame_rate arguments in video datase…

cf067c3

…t __init__ method

remove ssize_t

33b2ab5

remove size_t to pass CI check on Windows

4071e8f

add PyInit__video_reader function to pass CI check on Windows

9184f37

minor fix to define PyInit_video_reader symbol

6b8df4a

Make c++ video reader optional

7edcb6b

Temporarily revert changes to test_io

d2981bd

Revert changes to python files

4ea0440

Rename files to make it private

83c78e9

fmassa force-pushed the video_reader branch from a8351e2 to 83c78e9 Compare September 19, 2019 13:12

fmassa added 2 commits September 19, 2019 10:38

Fix python lint

37a2874

Fix C++ lint

906348c

zyan3 added 2 commits September 19, 2019 10:57

add a functor object EnumClassHash to make Enum class instances usabl…

be42248

…e as key type of std::unordered_map

fix cpp format check

5ecbd6a

fmassa mentioned this pull request Sep 20, 2019

Expose frame-rate and cache to video datasets #1356

Merged

fmassa merged commit 31fad34 into pytorch:master Sep 20, 2019

bjuncek mentioned this pull request Sep 23, 2019

Expose num_workers in VideoClips #1359

Merged

stephenyan1231 mentioned this pull request Sep 23, 2019

add _backend argument to __init__() of class VideoClips #1363

Merged

stephenyan1231 mentioned this pull request Oct 16, 2019

use nightly torchvision and torch 1.3 facebookresearch/ClassyVision#62

Closed

fmassa mentioned this pull request Oct 31, 2019

[v0.4.2] Release Tracker #1545

Closed

mthrok mentioned this pull request Dec 13, 2022

Fallback to best_effort_timestamp in case of invalid PTS pytorch/audio#2916

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[video reader] inception commit #1303

[video reader] inception commit #1303

stephenyan1231 commented Sep 6, 2019 •

edited

Loading

stephenyan1231 commented Sep 10, 2019

stephenyan1231 commented Sep 10, 2019 •

edited

Loading

stephenyan1231 commented Sep 10, 2019 •

edited

Loading

fmassa commented Sep 19, 2019

codecov-io commented Sep 19, 2019

lucasjinreal commented Oct 12, 2019

[video reader] inception commit #1303

[video reader] inception commit #1303

Conversation

stephenyan1231 commented Sep 6, 2019 • edited Loading

Attention

Main features

APIs

Sanity check

unit tests

Changes to TorchVision installation

Benchmark

Unit test

Comparison with PyAv

stephenyan1231 commented Sep 10, 2019

stephenyan1231 commented Sep 10, 2019 • edited Loading

stephenyan1231 commented Sep 10, 2019 • edited Loading

fmassa commented Sep 19, 2019

codecov-io commented Sep 19, 2019

Codecov Report

lucasjinreal commented Oct 12, 2019

stephenyan1231 commented Sep 6, 2019 •

edited

Loading

stephenyan1231 commented Sep 10, 2019 •

edited

Loading

stephenyan1231 commented Sep 10, 2019 •

edited

Loading