Replace `pydub` as dependency and use multithreaded resampling (exports 20x faster) #8

Bentroen · 2024-12-21T07:08:56Z

This pull request aims to tackle the multiple issues described at #6.

Changes

At its current state, the branch is capable of exporting files up to 20x faster than v0.4.0 by using multithreaded resampling. It replaces pydub (unmaintained and backed by the deprecated audioop module) with python-samplerate, a wrapper for libsamplerate. The methods in use from pydub have been ported to our library's code and adapted to work with numpy arrays, which by itself already makes exports a whole lot faster. But that's not all!

By virtue of a yet unmerged pull request (tuxu/python-samplerate#14), samplerate releases the GIL while resampling, letting us leverage the full power of the CPU:

It does so by precalculating all the resampling operations that have to be made, for the entire song, and batch-submitting them to concurrent.futures so they can be spread across different CPU threads. As each operation is completed, the resampled sounds are then overlaid into the final song file with the proper volume and panning.
Each operation also stores a context containing every position this sound has to be 'stamped' at in the final audio file, as well as the panning and volume each of those instances have to played at. Segments with the volume and panning applied are also reused in all places that they appear, avoiding many multiplication operations. This was already done in v0.4.0, but as the pitch and panning operations applied in audioop were really slow, this optimization didn't really shine.
These techniques allow for a clever optimization of the export method that allows us to avoid doing a lot of work completely. Knowing details of the structure of a typical note block song, such as the use of a limited pool of variations in pitch, actually proved really useful in designing an efficient system.

It also cleverly avoids unnecessary channel conversions to sync the sound files to the sample rate, sample width and channel count of the mixer. For instance, if mixing at 44.1 kHz and loading an audio file at 48 kHz (e.g. all the instruments added after 1.12), nbswave v0.4.0 would first convert the segment to be at 44.1 kHz (one resampling operation), then resample it again to get it at the proper pitch. This was really wasteful and unnecessary, as it can be done in a single operation - when applying the actual pitch conversion, we just have to multiply the resampling factor by 44100 / 48000.

Additionally, the math used internally in all audio operations now uses float32 arrays, as a consequence of this being the format returned by libsndfile and that libsamplerate works with. This means it's no longer necessary to use an oversized array to avoid clipping, as was previously done (int16 segments mixed in an int32 array), as the float format can overcome clipping entirely.

Finally, the track is now mixed at the target sample rate and channel count, rather than converted/resampled only at the end of the process, which should make both the output more accurate and processing faster.

Results

Here's a comparison of the elapsed export time when exporting the demo file included in the repository:

For a typical, 3-minute song, exporting shouldn't take longer than 15 seconds at the best resampling method.

With the outlined changes, the export time for the Megacollab file (250k+ notes) has decreased from 8 minutes down to under a minute at the best resampling method available (sinc_best), and just under 40 seconds with the cheapest (linear).

To-do

There are yet a few things missing to polish and explore before merging this pull request:

Optimize panning calculation (pydub expected gain to be provided in dB, but the methods are our own now, so we may freely change the implementation)
Check if libsoundfile supports all previously supported formats, and whether it can auto-detect the target format based on the filename (a direct call to ffmpeg may be necessary for some of the formats?)
https://libsndfile.github.io/libsndfile/formats.html
Check compatibility of the existing public interface with the new export method
Make mono tracks be mixed in mono, and stereo tracks be mixed in stereo (it currently works only with stereo signals internally)
Explore further work avoidance by resampling audio segments in mono, and only splitting the channels when panning must be applied (could lead to further optimization, but could also make it slower due to the need of duplicating the signal later in the chain - leading to more operations)
Refactor and clean up multithreaded resampling logic
Possibly allow the resampling method to be picked at export time (ResamplingMethod enum?)
Check memory usage of submitting all resampling operations at once

Bye bye pydub!

- The `Mixer` class now uses a 2D numpy array for all operations (I don't know why it didn't use one before) instead of a 1D array with one entry per sample *per channel*. - Sounds are now processed internally as `float32` instead of `int16` (with an oversized `int32` array for mixing), mostly as a consequence of it being the output format for `samplerate`. Fortunately, it is a good outcome - float arrays can just handle the overflow we throw at them. This commit strives to change as little as possible of the internal workings, as it is a refactor. But this replacement opens up many opportunities for refactoring further aspects of the code, and even making it more efficient - since we now don't rely on handling audio data the way `pydub` expects us to. Fixes #6

This commit implements the change proposed in tuxu/python-samplerate#14 via issue tuxu/python-samplerate#13

`nbswave` was 'syncing' all segments to the same sample rate before setting the speed and overlaying them. This is mostly because `pydub` (the library we used before) did it in that way. So we actually resampled most of the notes twice. This, however, is not needed: we can do a single resampling operation by storing the original sound's sample rate and weighing that in when multiplying the speed. For instance, to make a 48kHz file 1.5x faster with target sample rate 44.1kHz, we do: `1.5 * 48000 / 44100`

Avoids having to deal with channel conversion/'stacking' at the end, when every individual resampled segment will have to be duplicated. Attempted to use `np.column_stack`, `np.repeat` and `np.tile` to enforce stereo only when we know it's needed - at the `apply_gain_stereo` method -, and they all cause the bulk of the performance bottleneck when ran over all resampled segments.

…ively

Bentroen added 13 commits December 5, 2024 03:24

chore: replace pydub with soundfile and samplerate deps

2c6907d

Bye bye pydub!

chore: enable pyright type checking

17e324d

feat: use thread-based parallelism to speed up resampling

3c96cde

chore: replace samplerate with no-GIL pull request

7b2b099

This commit implements the change proposed in tuxu/python-samplerate#14 via issue tuxu/python-samplerate#13

chore: add audio, profiling, nbs files to .gitignore

59ba0db

chore: add snakeviz to dev dependencies

4f3f3b3

perf: smartly avoid doing any panning calculation when panning = 0

f38ff3d

refactor: simplify panning logic by removing some gain<->vol conversions

dd73268

perf: improve mixing performance by setting panning and volume select…

f4e5104

…ively

refactor: add a few useful print calls

b01cf76

Bentroen mentioned this pull request Mar 6, 2025

Release GIL while resampling tuxu/python-samplerate#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `pydub` as dependency and use multithreaded resampling (exports 20x faster) #8

Replace `pydub` as dependency and use multithreaded resampling (exports 20x faster) #8

Bentroen commented Dec 21, 2024 •

edited

Loading

Replace pydub as dependency and use multithreaded resampling (exports 20x faster) #8

Are you sure you want to change the base?

Replace pydub as dependency and use multithreaded resampling (exports 20x faster) #8

Conversation

Bentroen commented Dec 21, 2024 • edited Loading

Changes

Results

To-do

Replace `pydub` as dependency and use multithreaded resampling (exports 20x faster) #8

Replace `pydub` as dependency and use multithreaded resampling (exports 20x faster) #8

Bentroen commented Dec 21, 2024 •

edited

Loading