Skip to content

Proposal: Real Time Streaming MIDI Output Support #171

@Anipaleja

Description

@Anipaleja

Feature Proposal: Real Time Streaming MIDI Output Support

Summary

This feature request proposes the addition of real-time streaming MIDI output to basic-pitch, allowing the system to process and output MIDI data concurrently with audio input. This would significantly expand the usability of basic-pitch in live performance, educational, and DAW integration contexts.

Motivation

Currently, basic-pitch operates as a batch audio-to-MIDI converter, requiring the full audio file to be processed before producing MIDI output. While effective for offline applications, this architecture limits the tool’s applicability in live scenarios. Real-time audio-to-MIDI conversion has growing demand in:

  • Live instrument-to-MIDI conversion for digital audio workstations (DAWs)
  • Music education platforms requiring instant feedback
  • Interactive composition and improvisation tools
  • Low-latency MIDI controllers for experimental performance setups

Several commercial and research-grade tools provide real-time capabilities (e.g., JamOrigin MIDI Guitar, AIO MIDINet, and various ONNX-based pipelines), but few offer open-source solutions with the transcription accuracy that basic-pitch provides.

Proposed Implementation

A modular, low-latency real-time streaming pipeline could be introduced as an extension of the existing model. Suggested steps include:

Input Handling

  • Use pyaudio, sounddevice, or other low-latency libraries to stream audio input directly from a microphone or system source.
  • Implement windowed audio buffering with overlap to allow continuous model inference.

Inference Adaptation

  • Adapt the inference loop to process fixed-size frames (e.g., 2048 or 4096 samples) in real time.
  • Introduce incremental model state management to preserve performance across audio frames.

Streaming Output

  • Emit MIDI note events incrementally using a ring buffer or FIFO stream.
  • Optionally expose a MIDI output via mido, rtmidi, or similar libraries for live routing to DAWs or synthesizers.

Latency and Performance Tuning

  • Introduce a tunable latency buffer to balance between transcription accuracy and real-time responsiveness.
  • Profile model inference to determine optimal window sizes and overlaps under typical hardware constraints.

Optional Network Interface

  • For advanced use cases, expose the real-time inference through a lightweight WebSocket or gRPC API, enabling remote control and cloud deployment.

Anticipated Challenges

  • Model Adaptability: Ensuring the model performs well on partial inputs without full temporal context.
  • Latency Minimization: Achieving real-time responsiveness while maintaining accuracy will require careful tuning.
  • False Positives: Low-duration notes may introduce noise in real-time environments, so adaptive thresholding or smoothing may be necessary.

Benefits to the Ecosystem

  • Adds live performance capabilities to the basic-pitch ecosystem
  • Opens opportunities for integration with VSTs, DAWs, and educational tools
  • Fills a notable gap in the open-source music transcription landscape

Conclusion

Adding real-time streaming MIDI output to basic-pitch would make the tool significantly more versatile and competitive with proprietary solutions. Given its high transcription accuracy and open architecture, basic-pitch is well-positioned to lead in this space. This feature would serve both the open-source community and professional musicians seeking reliable, low-latency audio-to-MIDI conversion.

I’d be happy to contribute or assist with prototyping this functionality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions