-
Notifications
You must be signed in to change notification settings - Fork 354
Description
Feature Proposal: Real Time Streaming MIDI Output Support
Summary
This feature request proposes the addition of real-time streaming MIDI output to basic-pitch
, allowing the system to process and output MIDI data concurrently with audio input. This would significantly expand the usability of basic-pitch
in live performance, educational, and DAW integration contexts.
Motivation
Currently, basic-pitch
operates as a batch audio-to-MIDI converter, requiring the full audio file to be processed before producing MIDI output. While effective for offline applications, this architecture limits the tool’s applicability in live scenarios. Real-time audio-to-MIDI conversion has growing demand in:
- Live instrument-to-MIDI conversion for digital audio workstations (DAWs)
- Music education platforms requiring instant feedback
- Interactive composition and improvisation tools
- Low-latency MIDI controllers for experimental performance setups
Several commercial and research-grade tools provide real-time capabilities (e.g., JamOrigin MIDI Guitar, AIO MIDINet, and various ONNX-based pipelines), but few offer open-source solutions with the transcription accuracy that basic-pitch
provides.
Proposed Implementation
A modular, low-latency real-time streaming pipeline could be introduced as an extension of the existing model. Suggested steps include:
Input Handling
- Use
pyaudio
,sounddevice
, or other low-latency libraries to stream audio input directly from a microphone or system source. - Implement windowed audio buffering with overlap to allow continuous model inference.
Inference Adaptation
- Adapt the inference loop to process fixed-size frames (e.g., 2048 or 4096 samples) in real time.
- Introduce incremental model state management to preserve performance across audio frames.
Streaming Output
- Emit MIDI note events incrementally using a ring buffer or FIFO stream.
- Optionally expose a MIDI output via
mido
,rtmidi
, or similar libraries for live routing to DAWs or synthesizers.
Latency and Performance Tuning
- Introduce a tunable latency buffer to balance between transcription accuracy and real-time responsiveness.
- Profile model inference to determine optimal window sizes and overlaps under typical hardware constraints.
Optional Network Interface
- For advanced use cases, expose the real-time inference through a lightweight WebSocket or gRPC API, enabling remote control and cloud deployment.
Anticipated Challenges
- Model Adaptability: Ensuring the model performs well on partial inputs without full temporal context.
- Latency Minimization: Achieving real-time responsiveness while maintaining accuracy will require careful tuning.
- False Positives: Low-duration notes may introduce noise in real-time environments, so adaptive thresholding or smoothing may be necessary.
Benefits to the Ecosystem
- Adds live performance capabilities to the
basic-pitch
ecosystem - Opens opportunities for integration with VSTs, DAWs, and educational tools
- Fills a notable gap in the open-source music transcription landscape
Conclusion
Adding real-time streaming MIDI output to basic-pitch
would make the tool significantly more versatile and competitive with proprietary solutions. Given its high transcription accuracy and open architecture, basic-pitch
is well-positioned to lead in this space. This feature would serve both the open-source community and professional musicians seeking reliable, low-latency audio-to-MIDI conversion.
I’d be happy to contribute or assist with prototyping this functionality.