[Discussion] Decreasing binding calls

Prior to trying PyAV I was only accessing `ffmpeg` via `subprocess` and `ffmpeg-normalize`. This led to a lot overhead. If you want the details, I described it in the following paragraph. If not, feel free to skip it. My point is that I had *a lot* of both runtime and I/O overhead.

<sub>This had the drawback that `subprocess` has a lot of overhead with every call. `subprocess` filtering couldn't talk efficiently to `ffmpeg-normalize` which led to some hacky workarounds and otherwise unnecessary file I/O. I found that `ffmpeg-normalize` with `libmp3lame` triggers a bug if trying to normalizing `.mp3` files which for every audio file I processed necessitated `filter`ing a `.mp3`, converting the `.mp3` to another format (I chose `.wav` as it's the best support by `ffmpeg-normalize`), reading the `.wav`, invoking `ffmpeg-normalize` on the `.wav`, converting the `.wav` back to `.mp3`, writing the `.mp3` from `ffmpeg-normalize`, reading the `.mp3`, and finishing `filter`ing. </sub>
 
I tried PyAV because with direct bindings I'd be able to cut out the superfluous `subprocess` calls and I/O operations.

I was surprised to see my PyAV implementation was significantly slower (~9 times slower without threading & ~2.5 slower with threading) than just calling subprocess repeatedly.

From profiling it appears the bulk of the time is spent `push`ing and `pull`ing frames through the filter graph. Each individual frame object must be converted to its C object (small overhead), processed inside `ffmpeg` (pretty dang fast), and then converted back to python (small overhead). The problem is that small overhead occurs for every one of the thousands of frames for every file processed, and this overhead multiplies.

You can alleviate `push` overhead by `push`ing more often and letting the results build up in the underlying C buffer. Then when you `pull`, you `pull` a batch all at once. But I suspect letting frames build up in an underlying C buffer still incurs a lot of reallocating because there's way to indicate to the underlying function how much it should expect you to `push`. And this still workaround for `pull`ing doesn't help at all with batch `push`ing.

Threading will divide up the overhead between threads, but aggregrate overhead (py_c_conversion_t * c_py_conversion_t * pushes_eq_frames * avg_#_of_batches_of_pulls * avg_frames_per_file * files) won't be decreased, just divided. And that's not enough to offset this sizeable performance hit. Maybe if you have a threadripper... a man can dream.

I think the only solution is to add a layer of C wrappers around `ffmpeg filter`ing functions that essentially just hold the converted data (Py to C), pass that converted data to the underlying `ffmpeg` functions, poll for completion, accumulate the returned data, and return once all processed frames are available. Essentially this would enable efficient batch processing to drastically reduce overhead. 

I read through the PyAV documentation that looked potentially relevant as well as the GitHub example file for audio processing (my use case). But I apologize if I'm missing something, and there's already a neat way to do this. Also this may just be outside the scope of this project, but I wanted to try to contribute a bit in my own very limited way by bringing it up :)

It also occured to me that some part of `ffmpeg` must handle passing frames to the underlying filter functions so maybe that would be a place to look? I couldn't get the project to build, and the source code was way over my head so I'm kind of shot on that.

Here's how I was processing frames [link to the file](https://github.com/shford/anki_lexique_flashcard_generator/blob/579ad59e644ea2ab16cdc2834f1bbc697049ad31/src/pyav_replacement.py):
```
def process_frames(frames, graph):
    processed_frames = []
    frame_iter = iter(frames)
    has_frames_to_push = True
    while True:
        # try to push next input frame, if available
        if has_frames_to_push:
            try:
                frame = next(frame_iter)
                graph.push(frame)
            except StopIteration:
                has_frames_to_push = False
                graph.push(None)  # signal end of input
            except (av.BlockingIOError, av.EOFError):
                # in case of implementation that's not like fsm and not done
                # but was actually just blocking
                # benign: just means the graph isn't ready yet
                pass

        # poll to pull available frames
        while True:
            try:
                f = graph.pull()
                if f is None and not has_frames_to_push:
                    # done.
                    return processed_frames
                elif f is not None:
                    processed_frames.append(f)
                break  # break if not done - to poll or push more frames
            except av.BlockingIOError:
                # graph is not ready for more output yet,
                # attempt to push more frames (good if like fsm),
                # polls if not able to push
                break
            except av.EOFError:
                # some implementations let you know they're done via this error :')
                return processed_frames
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Discussion] Decreasing binding calls #2009

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion] Decreasing binding calls #2009

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions