-
-
Notifications
You must be signed in to change notification settings - Fork 2
Replace pydub
as dependency and use multithreaded resampling (exports 20x faster)
#8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Bentroen
wants to merge
13
commits into
main
Choose a base branch
from
refactor/replace-pydub
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bye bye pydub!
- The `Mixer` class now uses a 2D numpy array for all operations (I don't know why it didn't use one before) instead of a 1D array with one entry per sample *per channel*. - Sounds are now processed internally as `float32` instead of `int16` (with an oversized `int32` array for mixing), mostly as a consequence of it being the output format for `samplerate`. Fortunately, it is a good outcome - float arrays can just handle the overflow we throw at them. This commit strives to change as little as possible of the internal workings, as it is a refactor. But this replacement opens up many opportunities for refactoring further aspects of the code, and even making it more efficient - since we now don't rely on handling audio data the way `pydub` expects us to. Fixes #6
This commit implements the change proposed in tuxu/python-samplerate#14 via issue tuxu/python-samplerate#13
`nbswave` was 'syncing' all segments to the same sample rate before setting the speed and overlaying them. This is mostly because `pydub` (the library we used before) did it in that way. So we actually resampled most of the notes twice. This, however, is not needed: we can do a single resampling operation by storing the original sound's sample rate and weighing that in when multiplying the speed. For instance, to make a 48kHz file 1.5x faster with target sample rate 44.1kHz, we do: `1.5 * 48000 / 44100`
Avoids having to deal with channel conversion/'stacking' at the end, when every individual resampled segment will have to be duplicated. Attempted to use `np.column_stack`, `np.repeat` and `np.tile` to enforce stereo only when we know it's needed - at the `apply_gain_stereo` method -, and they all cause the bulk of the performance bottleneck when ran over all resampled segments.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request aims to tackle the multiple issues described at #6.
Changes
At its current state, the branch is capable of exporting files up to 20x faster than v0.4.0 by using multithreaded resampling. It replaces
pydub
(unmaintained and backed by the deprecatedaudioop
module) withpython-samplerate
, a wrapper for libsamplerate. The methods in use frompydub
have been ported to our library's code and adapted to work withnumpy
arrays, which by itself already makes exports a whole lot faster. But that's not all!By virtue of a yet unmerged pull request (tuxu/python-samplerate#14),
samplerate
releases the GIL while resampling, letting us leverage the full power of the CPU:It does so by precalculating all the resampling operations that have to be made, for the entire song, and batch-submitting them to
concurrent.futures
so they can be spread across different CPU threads. As each operation is completed, the resampled sounds are then overlaid into the final song file with the proper volume and panning.Each operation also stores a context containing every position this sound has to be 'stamped' at in the final audio file, as well as the panning and volume each of those instances have to played at. Segments with the volume and panning applied are also reused in all places that they appear, avoiding many multiplication operations. This was already done in v0.4.0, but as the pitch and panning operations applied in
audioop
were really slow, this optimization didn't really shine.These techniques allow for a clever optimization of the export method that allows us to avoid doing a lot of work completely. Knowing details of the structure of a typical note block song, such as the use of a limited pool of variations in pitch, actually proved really useful in designing an efficient system.
It also cleverly avoids unnecessary channel conversions to sync the sound files to the sample rate, sample width and channel count of the mixer. For instance, if mixing at 44.1 kHz and loading an audio file at 48 kHz (e.g. all the instruments added after 1.12), nbswave v0.4.0 would first convert the segment to be at 44.1 kHz (one resampling operation), then resample it again to get it at the proper pitch. This was really wasteful and unnecessary, as it can be done in a single operation - when applying the actual pitch conversion, we just have to multiply the resampling factor by
44100 / 48000
.Additionally, the math used internally in all audio operations now uses
float32
arrays, as a consequence of this being the format returned bylibsndfile
and thatlibsamplerate
works with. This means it's no longer necessary to use an oversized array to avoid clipping, as was previously done (int16
segments mixed in anint32
array), as the float format can overcome clipping entirely.Finally, the track is now mixed at the target sample rate and channel count, rather than converted/resampled only at the end of the process, which should make both the output more accurate and processing faster.
Results
Here's a comparison of the elapsed export time when exporting the demo file included in the repository:
For a typical, 3-minute song, exporting shouldn't take longer than 15 seconds at the best resampling method.
With the outlined changes, the export time for the Megacollab file (250k+ notes) has decreased from 8 minutes down to under a minute at the best resampling method available (
sinc_best
), and just under 40 seconds with the cheapest (linear
).To-do
There are yet a few things missing to polish and explore before merging this pull request:
ffmpeg
may be necessary for some of the formats?)https://libsndfile.github.io/libsndfile/formats.html
ResamplingMethod
enum?)