-
Notifications
You must be signed in to change notification settings - Fork 406
Description
Prior to trying PyAV I was only accessing ffmpeg
via subprocess
and ffmpeg-normalize
. This led to a lot overhead. If you want the details, I described it in the following paragraph. If not, feel free to skip it. My point is that I had a lot of both runtime and I/O overhead.
This had the drawback that subprocess
has a lot of overhead with every call. subprocess
filtering couldn't talk efficiently to ffmpeg-normalize
which led to some hacky workarounds and otherwise unnecessary file I/O. I found that ffmpeg-normalize
with libmp3lame
triggers a bug if trying to normalizing .mp3
files which for every audio file I processed necessitated filter
ing a .mp3
, converting the .mp3
to another format (I chose .wav
as it's the best support by ffmpeg-normalize
), reading the .wav
, invoking ffmpeg-normalize
on the .wav
, converting the .wav
back to .mp3
, writing the .mp3
from ffmpeg-normalize
, reading the .mp3
, and finishing filter
ing.
I tried PyAV because with direct bindings I'd be able to cut out the superfluous subprocess
calls and I/O operations.
I was surprised to see my PyAV implementation was significantly slower (~9 times slower without threading & ~2.5 slower with threading) than just calling subprocess repeatedly.
From profiling it appears the bulk of the time is spent push
ing and pull
ing frames through the filter graph. Each individual frame object must be converted to its C object (small overhead), processed inside ffmpeg
(pretty dang fast), and then converted back to python (small overhead). The problem is that small overhead occurs for every one of the thousands of frames for every file processed, and this overhead multiplies.
You can alleviate push
overhead by push
ing more often and letting the results build up in the underlying C buffer. Then when you pull
, you pull
a batch all at once. But I suspect letting frames build up in an underlying C buffer still incurs a lot of reallocating because there's way to indicate to the underlying function how much it should expect you to push
. And this still workaround for pull
ing doesn't help at all with batch push
ing.
Threading will divide up the overhead between threads, but aggregrate overhead (py_c_conversion_t * c_py_conversion_t * pushes_eq_frames * avg_#_of_batches_of_pulls * avg_frames_per_file * files) won't be decreased, just divided. And that's not enough to offset this sizeable performance hit. Maybe if you have a threadripper... a man can dream.
I think the only solution is to add a layer of C wrappers around ffmpeg filter
ing functions that essentially just hold the converted data (Py to C), pass that converted data to the underlying ffmpeg
functions, poll for completion, accumulate the returned data, and return once all processed frames are available. Essentially this would enable efficient batch processing to drastically reduce overhead.
I read through the PyAV documentation that looked potentially relevant as well as the GitHub example file for audio processing (my use case). But I apologize if I'm missing something, and there's already a neat way to do this. Also this may just be outside the scope of this project, but I wanted to try to contribute a bit in my own very limited way by bringing it up :)
It also occured to me that some part of ffmpeg
must handle passing frames to the underlying filter functions so maybe that would be a place to look? I couldn't get the project to build, and the source code was way over my head so I'm kind of shot on that.
Here's how I was processing frames link to the file:
def process_frames(frames, graph):
processed_frames = []
frame_iter = iter(frames)
has_frames_to_push = True
while True:
# try to push next input frame, if available
if has_frames_to_push:
try:
frame = next(frame_iter)
graph.push(frame)
except StopIteration:
has_frames_to_push = False
graph.push(None) # signal end of input
except (av.BlockingIOError, av.EOFError):
# in case of implementation that's not like fsm and not done
# but was actually just blocking
# benign: just means the graph isn't ready yet
pass
# poll to pull available frames
while True:
try:
f = graph.pull()
if f is None and not has_frames_to_push:
# done.
return processed_frames
elif f is not None:
processed_frames.append(f)
break # break if not done - to poll or push more frames
except av.BlockingIOError:
# graph is not ready for more output yet,
# attempt to push more frames (good if like fsm),
# polls if not able to push
break
except av.EOFError:
# some implementations let you know they're done via this error :')
return processed_frames