Skip to content

Consider marking an I-frame with Recovery Point SEI message as h264 key frame #650

@reinhrst

Description

@reinhrst

At start of decode (and after a flush), WebCodecs VideoDecoder demands a keyframe which at the moment is defined as an IDR frame.

H264 has the concept of a Recovery Point SEI Message (D.2.8 in the (08.21) h264 spec): "The recovery point SEI message assists a decoder in determining when the decoding process will produce acceptable pictures for display after the decoder initiates random access or after the encoder indicates a broken link in the coded video sequence.".

So (afaict) an I-frame with a such a SEI message is meant to be usable as start frame for a decoding operation.

ffprobe also marks these frames as key-frames.

I don't have enough data to comment on how often this happens in real-live video streams; personally I have 1000s of hours of videos taken with different JVC / Sony camcorders (timelaps recordings, used in animal conservation projects), which have the following properties:

  • Stream starts (when record button is pressed) with IDR frame
  • IBBPBBPBBPBBI GOPs, where every I-frame has Recovery Point SEI message with exact_match_flag=1 and recovery_frame_cnt=0
  • IDR frames repeat every 300 frames (every 25 GOPs)
  • Streams get "cut" after 4GB recording into new file, new file starts with I-frame, but not (guaranteed) IDR frame.

Not being able to start decoding on I-frame + SEI means that:

  • Worst case first 24 GOP's of stream can not be decoded without having access to previous file
  • When random-access is needed in decoder, worst case 299 frames need to be decoded before requested frame can be shown (takes about 0.25s on my M1 macbook, not the end of the world, but not a smooth drag-playhead-and-find experience for users either. Note that the video files generally are 4GB large, so decoding all frames up-front is also not a solution.

Solution on client side (short of recoding, which results in unacceptable quality loss) that kind of seems to work (but probably a very bad idea) is to add a dummy-IDR frame that I offer to the decoder before feeding the real stream (and then dropping the first frame of the output).

Metadata

Metadata

Labels

maybeIdeas that might be in scope, and worth discussingneed-definitionAn issues where something needs to be specified normativelyregistrypertains to new or updated registry entry

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions