-
Notifications
You must be signed in to change notification settings - Fork 385
Description
Hi maintainers,
I've been reviewing the inference code in DeepFilterNet and noticed several opportunities to reduce memory allocations and avoid redundant data copying. Below are three targeted suggestions that could improve runtime performance, especially on low-resource or real-time systems.
- Optimize rolling buffer updates using std::mem::swap
Before (per frame):
self.rolling_spec_buf_x.push_back(self.spec_buf.clone());
Proposed optimization:
Use std::mem::swap (O(1)) to avoid cloning:
self.rolling_spec_buf_y.push_back(self.temp_spec_buf.clone()); // only one clone if needed
Alternatively, if the buffers can be moved instead of cloned, even better—this would eliminate copies entirely.
- Avoid cloning in synthesis() by accepting slices instead of owned data
Before:
state.synthesis(
spec_ch.to_owned().as_slice_mut().unwrap(), // clones ~4KB
enh_out_ch.as_slice_mut().unwrap(),
);
Proposed change:
Modify synthesis() to accept immutable input:
pub fn synthesis(&mut self, input: &[Complex32], output: &mut [f32])
Then call without cloning:
state.synthesis(
spec_ch.as_slice().unwrap(), // zero-copy
enh_out_ch.as_slice_mut().unwrap(),
);
This eliminates an unnecessary 4KB allocation per channel per frame.
- Reuse pre-allocated input buffers for Tract model inference
Before:
let mut enc_emb = self.enc.run(tvec!(
self.erb_buf.clone(), // ~128 bytes
TValue::from(self.cplx_buf.clone().into_tensor()...) // ~4KB
))?;
Proposed optimization:
Pre-allocate and reuse the input vector to avoid repeated allocations:
let mut enc_input = self.enc_input_buffer.take();
enc_input[0] = self.erb_buf.clone();
enc_input[1] = TValue::from(...);
let mut enc_emb = self.enc.run(enc_input)?;
While cloning may still be unavoidable due to Tract’s ownership requirements, reusing the outer Vec reduces allocator pressure.
These changes aim to minimize per-frame heap allocations and memory copies, which should improve latency and reduce GC pressure (if applicable) or cache misses. I’d be happy to submit a PR if these ideas align with the project’s direction.
Thanks for your great work on DeepFilterNet!