-
Notifications
You must be signed in to change notification settings - Fork 172
Description
The WebCodecs standard exposes browser-internal decoding functionality to JavaScript. Currently, to play audio with these userspace decoders requires operations across multiple threads:
- A decoder instance is created from the control thread. The instance is fed data, and a decode is requested.
- Decoding happens asynchronously, likely on a dedicated thread.
- When decoding finishes, it must notify back to the control thread, which can then create and connect an AudioBufferSourceNode.
- This sends a message to the Web Audio rendering thread which can now play it.
(Playing sound decoded with decodeAudioData is similar.)
Since the control thread may be busy with other work, this can add significant latency to playing encoded audio and may also result in gaps if this delay exceeds the amount of decoded audio already buffered to play. Queueing buffers back-to-back this way is also subject to a race condition that can result in audible artifacts.
A hypothetical DecoderSourceNode could work around this:
declare class DecoderSourceNode extends AudioScheduledSourceNode {
constructor(context: BaseAudioContext);
decoder: AudioDecoder;
}- The DecoderSourceNode takes a WebCodecs AudioDecoder as a property. Users can freely queue encoded audio buffers to the decoder, in any size, and will not need to explicitly decode them.
- When a DecoderSourceNode is actively playing and holds a decoder instance, the Web Audio implementation controls when and how much to decode as data is needed to render. Decoding could be done just-in-time in the rendering thread, or ahead of time on a dedicated thread and buffered, etc., as long as enough data is available to render as needed.
- Attempting to manually decode from an AudioDecoder attached to a DecoderSourceNode, or attach it to multiple nodes simultaneously, results in an error.
With this model, the control thread is only responsible for creating the DecoderSourceNode and periodically feeding it with data. The audio renderer is responsible for decoding an appropriate amount as needed.
We're using a version of this in our internal Web Audio implementation, and in my opinion, it simplifies a very common use case and makes it more efficient: playing a sound effect in some compressed format like vorbis. Typically today users would either decode the entire sound ahead of time, incurring extra latency and memory overhead, or else implement their own decode-and-buffer-in-chunks scheme that adds complexity and potential gaps.