Re-encoding with variable frame rate

First of all, thank you for maintaining PyAV. It is a very valuable tool.

I wanted to use PyAV to cut a video into shorter segments. The input video has a variable frame rate (mostly close to 30 fps, but sometimes also a lot smaller) and I would like to maintain the presentation timestamps. The issue is that I find it difficult to control the presentation timestamps in the resulting output videos.

To simplify the problem, I created a simple example that should decode an input video frame by frame and re-encode it with the original presentation timestamps.

```python
input_path = "..."
output_path = "..."

# Open the input container
input_container  = av.open(input_path)
input_stream  = input_container.streams.video[0]

# Open the output container
output_container = av.open(output_path, "w")

# Create output stream and copy some options from the input stream
out_stream = output_container.add_stream("libx264")
out_stream.width = input_stream.width
out_stream.height = input_stream.height
out_stream.pix_fmt = "yuv420p"
out_stream.time_base   = input_stream.time_base
out_stream.options = {
    "crf": "0",
    "preset": "slow",
    "profile": "high444",
    "bf": "0",
    "colorprim": "bt709",
    "transfer": "bt709",
    "colormatrix": "bt709",
    "x264opts": "cabac=1",
}

# Iterate over the frames
for input_frame in input_container.decode(video=0):

    # Create a new frame with the pixel values of the input frame
    output_frame = av.VideoFrame.from_ndarray(input_frame.to_ndarray(), format=input_frame.format.name)

    # Copy pts from the input frame
    output_frame.pts = input_frame.pts

    # Encode and mux
    for packet in out_stream.encode(output_frame):
        output_container.mux(packet)

# Flush
for packet in out_stream.encode():
    output_container.mux(packet)

# Close containers
output_container.close()
input_container.close()
```

Let's compare the input and output video with ffprobe:

```
# Input video
Video: h264 (High 4:4:4 Predictive), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn (default)

# Output video
Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 16k tbn (default)
```

The problem is that the frame rate of the output video is a lot smaller than the original video.



I tried several variants of the code above:

(1) Instead of `out_stream.encode(output_frame)`, do `out_stream.encode(input_frame)`.
The program fails with a `ValueError`. I assume it could be related to some parameters set in the `input_frame` object that the encoder doesn't like, e.g., some DTS. To overcome this problem, I created a new frame object based on the input frame's data.

(2) Set the time scale of the output container, i.e., `output_container = av.open(output_path, "w", container_options={"video_track_timescale": "1000"})`.  
ffprobe now shows:
```
Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 1k tbn (default)
```
The container timebase (tbn) now matches the input video, but this had no effect on the frame rate.

(3) Explicitly specify a rate, i.e., `out_stream = output_container.add_stream("libx264", rate=30)`
ffprobe now shows:
```
Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 408 kb/s, 0.91 fps, 30 tbr, 16k tbn (default)
```
The tbr changed from 24 to 30, but the frame rate is still below 1 fps.

(4) Explicitly set a packet time base before muxing, i.e.
```python
# Encode and mux
for packet in out_stream.encode(output_frame):
    packet.time_base = input_stream.time_base
    output_container.mux(packet)
```

ffprobe now shows:
```
Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 13631 kb/s, 30.18 fps, 30 tbr, 16k tbn (default)
```

This is very close to the desired output, but I am skeptical whether this is the correct solution because I have not seen any examples that set the packet time base. Also, the fps and the total duration do not perfectly match the input file.



Frankly, I am confused what arguments I have to set to copy the input video's presentation timestamps. It would be super helpful to have an example that explains the effect of the `video_track_timescale`, the optional `rate` argument in `output_container.add_stream`, as well as `output_stream.time_base`, `output_frame.time_base`, `packet.time_base`. Something seems to be rescaling the timestamps, and I would like to understand what it is.

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-encoding with variable frame rate #1959

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Re-encoding with variable frame rate #1959

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions