-
Notifications
You must be signed in to change notification settings - Fork 689
Update on TorchAudio’s future #3902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @scotts, thanks for reporting the status of torchaudio and future plans. I don't understand the decision to drop the C++/CUDA extensions... There should be more discussions on this before making the decision. Best wishes, Chin-Yun Centre for Digital Music |
@scotts thanks for the update! |
About lfilter, it would be nice to match the scipy precision and behaviour. I understand in big pictures but a lot of work because of this. |
@yoyolicoris, @christhetree, thanks for taking the time to reply. I understand that removing C++ implementations may be a performance regression for those components. I would like to further explain the motivation for why removing this C++ code specifically improves the long-term health of TorchAudio:
In the update, we did say: "We are exploring options to retain C++-backed APIs, but this is unlikely." Specifically, that exploration is if we can take advantage of a new effort in PyTorch 2.7, which is a stable ABI. That only addresses point 3, but addressing point 3 could greatly reduce the cost of point 2. The cost of point 1 would still stand, though. For those interested in retaining various C++ components, let us know if you have the capacity to explore porting these components to the stable ABI. That changes the maintenance cost equation. |
Maybe for some other C++ components, the model could be to factor them out in separate repo which doesn't provide binaries releases and supports only some GitHub Actions CI for testing and relies on users who must build it themselves Also, for some C++ code, maybe Also, maybe a way forward would be to convert some C++ code to pure C API (e.g. could work for ffmpeg effects), to be called via ctypes (and use DLPack API or pure pointers for passing tensors for processing). This should eliminate the problem of unstable PyTorch C++ ABI. Regarding ffmpeg effects, maybe they could also be moved to torchcodec, as working with ffmpeg filter chains would be a very useful feature... Another useful component in torchaudio are bindings to flashlight, but flashlight itself is discontinued for several years now. So probably the best path there would be factoring out flashlight C++ code + python bindings in torchaudio in a new standalone repo like Nvidia did: https://github.com/nvidia-riva/riva-asrlib-decoder . This is already half-done into https://github.com/flashlight/text, but would be nice to maybe move the Python bindings https://pytorch.org/audio/0.12.0/models.decoder.html next to it? Also, given that Flashlight itself is discontinued, maybe worth moving the decoder out of the Flashlight org? to the pytorch org? |
Thank you for sharing this. Thanks for all the efforts, |
Dear TorchAudio users,
TorchAudio is the most popular audio library for PyTorch. It has critical transforms, models and datasets that we know the community relies on. That is why we wanted to let the community know that we have started a refactoring effort to transition TorchAudio into a maintenance phase. This process will involve removal of some user-facing features. We have three goals we want to achieve with this effort:
The diagram below depicts the various components of TorchAudio. We have highlighted it according to the user-facing API changes that we are making:
Starting with TorchAudio 2.8 (expected around August 2025), APIs slated for removal will trigger a deprecation warning. These APIs will be fully removed in TorchAudio 2.9 (anticipated by the end of 2025).
Most of the APIs in
transforms
,functional
,compliance.kaldi
,models
andpipelines
modules will remain. These are the APIs that we identified as the most popular and valuable ones.lfilter
andoverdrive
, will switch to pure-Python implementations, which might affect performance. We are exploring options to retain C++-backed APIs, but this is unlikely.The decoding and encoding capabilities of TorchAudio for both audio and video data will migrate to TorchCodec, where we are consolidating all of PyTorch media decoding and encoding. TorchAudio’s decoding and encoding APIs will be deprecated from TorchAudio 2.8, and they will be removed in TorchAudio 2.9, so we encourage users to migrate to TorchCodec as soon as possible. TorchCodec already supports video and audio decoding, and encoding will be supported soon. While there isn't a direct 1:1 API mapping, the migration process should be smooth. Please report any issues in the TorchCodec repository.
All other modules and APIs will be removed in TorchAudio 2.9.
We understand that these changes may be disruptive. We believe that they are unfortunately necessary, in order for us to guarantee TorchAudio’s stability in the future.
The text was updated successfully, but these errors were encountered: