-
Notifications
You must be signed in to change notification settings - Fork 406
Description
First of all, thank you for maintaining PyAV. It is a very valuable tool.
I wanted to use PyAV to cut a video into shorter segments. The input video has a variable frame rate (mostly close to 30 fps, but sometimes also a lot smaller) and I would like to maintain the presentation timestamps. The issue is that I find it difficult to control the presentation timestamps in the resulting output videos.
To simplify the problem, I created a simple example that should decode an input video frame by frame and re-encode it with the original presentation timestamps.
input_path = "..."
output_path = "..."
# Open the input container
input_container = av.open(input_path)
input_stream = input_container.streams.video[0]
# Open the output container
output_container = av.open(output_path, "w")
# Create output stream and copy some options from the input stream
out_stream = output_container.add_stream("libx264")
out_stream.width = input_stream.width
out_stream.height = input_stream.height
out_stream.pix_fmt = "yuv420p"
out_stream.time_base = input_stream.time_base
out_stream.options = {
"crf": "0",
"preset": "slow",
"profile": "high444",
"bf": "0",
"colorprim": "bt709",
"transfer": "bt709",
"colormatrix": "bt709",
"x264opts": "cabac=1",
}
# Iterate over the frames
for input_frame in input_container.decode(video=0):
# Create a new frame with the pixel values of the input frame
output_frame = av.VideoFrame.from_ndarray(input_frame.to_ndarray(), format=input_frame.format.name)
# Copy pts from the input frame
output_frame.pts = input_frame.pts
# Encode and mux
for packet in out_stream.encode(output_frame):
output_container.mux(packet)
# Flush
for packet in out_stream.encode():
output_container.mux(packet)
# Close containers
output_container.close()
input_container.close()
Let's compare the input and output video with ffprobe:
# Input video
Video: h264 (High 4:4:4 Predictive), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn (default)
# Output video
Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 16k tbn (default)
The problem is that the frame rate of the output video is a lot smaller than the original video.
I tried several variants of the code above:
(1) Instead of out_stream.encode(output_frame)
, do out_stream.encode(input_frame)
.
The program fails with a ValueError
. I assume it could be related to some parameters set in the input_frame
object that the encoder doesn't like, e.g., some DTS. To overcome this problem, I created a new frame object based on the input frame's data.
(2) Set the time scale of the output container, i.e., output_container = av.open(output_path, "w", container_options={"video_track_timescale": "1000"})
.
ffprobe now shows:
Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 1k tbn (default)
The container timebase (tbn) now matches the input video, but this had no effect on the frame rate.
(3) Explicitly specify a rate, i.e., out_stream = output_container.add_stream("libx264", rate=30)
ffprobe now shows:
Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 408 kb/s, 0.91 fps, 30 tbr, 16k tbn (default)
The tbr changed from 24 to 30, but the frame rate is still below 1 fps.
(4) Explicitly set a packet time base before muxing, i.e.
# Encode and mux
for packet in out_stream.encode(output_frame):
packet.time_base = input_stream.time_base
output_container.mux(packet)
ffprobe now shows:
Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 13631 kb/s, 30.18 fps, 30 tbr, 16k tbn (default)
This is very close to the desired output, but I am skeptical whether this is the correct solution because I have not seen any examples that set the packet time base. Also, the fps and the total duration do not perfectly match the input file.
Frankly, I am confused what arguments I have to set to copy the input video's presentation timestamps. It would be super helpful to have an example that explains the effect of the video_track_timescale
, the optional rate
argument in output_container.add_stream
, as well as output_stream.time_base
, output_frame.time_base
, packet.time_base
. Something seems to be rescaling the timestamps, and I would like to understand what it is.
Thanks for your help!