Skip to content

Re-encoding with variable frame rate #1959

@benedikt-grl

Description

@benedikt-grl

First of all, thank you for maintaining PyAV. It is a very valuable tool.

I wanted to use PyAV to cut a video into shorter segments. The input video has a variable frame rate (mostly close to 30 fps, but sometimes also a lot smaller) and I would like to maintain the presentation timestamps. The issue is that I find it difficult to control the presentation timestamps in the resulting output videos.

To simplify the problem, I created a simple example that should decode an input video frame by frame and re-encode it with the original presentation timestamps.

input_path = "..."
output_path = "..."

# Open the input container
input_container  = av.open(input_path)
input_stream  = input_container.streams.video[0]

# Open the output container
output_container = av.open(output_path, "w")

# Create output stream and copy some options from the input stream
out_stream = output_container.add_stream("libx264")
out_stream.width = input_stream.width
out_stream.height = input_stream.height
out_stream.pix_fmt = "yuv420p"
out_stream.time_base   = input_stream.time_base
out_stream.options = {
    "crf": "0",
    "preset": "slow",
    "profile": "high444",
    "bf": "0",
    "colorprim": "bt709",
    "transfer": "bt709",
    "colormatrix": "bt709",
    "x264opts": "cabac=1",
}

# Iterate over the frames
for input_frame in input_container.decode(video=0):

    # Create a new frame with the pixel values of the input frame
    output_frame = av.VideoFrame.from_ndarray(input_frame.to_ndarray(), format=input_frame.format.name)

    # Copy pts from the input frame
    output_frame.pts = input_frame.pts

    # Encode and mux
    for packet in out_stream.encode(output_frame):
        output_container.mux(packet)

# Flush
for packet in out_stream.encode():
    output_container.mux(packet)

# Close containers
output_container.close()
input_container.close()

Let's compare the input and output video with ffprobe:

# Input video
Video: h264 (High 4:4:4 Predictive), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn (default)

# Output video
Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 16k tbn (default)

The problem is that the frame rate of the output video is a lot smaller than the original video.

I tried several variants of the code above:

(1) Instead of out_stream.encode(output_frame), do out_stream.encode(input_frame).
The program fails with a ValueError. I assume it could be related to some parameters set in the input_frame object that the encoder doesn't like, e.g., some DTS. To overcome this problem, I created a new frame object based on the input frame's data.

(2) Set the time scale of the output container, i.e., output_container = av.open(output_path, "w", container_options={"video_track_timescale": "1000"}).
ffprobe now shows:

Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 1k tbn (default)

The container timebase (tbn) now matches the input video, but this had no effect on the frame rate.

(3) Explicitly specify a rate, i.e., out_stream = output_container.add_stream("libx264", rate=30)
ffprobe now shows:

Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 408 kb/s, 0.91 fps, 30 tbr, 16k tbn (default)

The tbr changed from 24 to 30, but the frame rate is still below 1 fps.

(4) Explicitly set a packet time base before muxing, i.e.

# Encode and mux
for packet in out_stream.encode(output_frame):
    packet.time_base = input_stream.time_base
    output_container.mux(packet)

ffprobe now shows:

Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 13631 kb/s, 30.18 fps, 30 tbr, 16k tbn (default)

This is very close to the desired output, but I am skeptical whether this is the correct solution because I have not seen any examples that set the packet time base. Also, the fps and the total duration do not perfectly match the input file.

Frankly, I am confused what arguments I have to set to copy the input video's presentation timestamps. It would be super helpful to have an example that explains the effect of the video_track_timescale, the optional rate argument in output_container.add_stream, as well as output_stream.time_base, output_frame.time_base, packet.time_base. Something seems to be rescaling the timestamps, and I would like to understand what it is.

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions