-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Use a linked list of background tasks to perform #2879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! My brain will start thinking through this approach. It's not what I was thinking which means it is probably better. :-)
Discussion with @tannewt
Which modules to do next
|
bbd9436
to
7944f50
Compare
The initial comment has been heavily revised to reflect the current status of this PR. Things are progressing nicely. SAM D5x/E5x is probably done, though as always there may be problems that only become obvious after testing, such as in the next alpha release. |
The performance gained back on the CLUE is much smaller... but it has not been given specific attention yet.
|
With the latest changes performance on nRF is a bit better than the 5.3.0 baseline (now testing on Particle Xenon).
I suspect the 5.3.0 performance of the Xenon is better than the CLUE because the latter has a displayio display enabled. However, I didn't test the theory. |
0823ad7
to
2056ff6
Compare
The motivation for doing this is so that we can allow common_hal_mcu_disable_interrupts in IRQ context, something that works on other ports, but not on nRF with SD enabled. This is because when SD is enabled, calling sd_softdevice_is_enabled in the context of an interrupt with priority 2 or 3 causes a HardFault. We have chosen to give the USB interrupt priority 2 on nRF, the highest priority that is compatible with SD. Since at least SoftDevice s130 v2.0.1, sd_nvic_critical_region_enter/exit have been implemented as inline functions and are safe to call even if softdevice is not enabled. Reference kindly provided by danh: https://devzone.nordicsemi.com/f/nordic-q-a/29553/sd_nvic_critical_region_enter-exit-missing-in-s130-v2 Switching to these as the default/only way to enable/disable interrupts simplifies things, and fixes several problems and potential problems: * Interrupts at priority 2 or 3 could not call common_hal_mcu_disable_interrupts because the call to sd_softdevice_is_enabled would HardFault * Hypothetically, the state of sd_softdevice_is_enabled could change from the disable to the enable call, meaning the calls would not match (__disable_irq() could be balanced with sd_nvic_critical_region_exit). This also fixes a problem I believe would exist if disable() were called twice when SD is enabled. There is a single "is_nested_critical_region" flag, and the second call would set it to 1. Both of the enable() calls that followed would call critical_region_exit(1), and interrupts would not properly be reenabled. In the new version of the code, we use our own nesting_count value to track the intended state, so now nested disable()s only call critical_region_enter() once, only updating is_nested_critical_region once; and only the second enable() call will call critical_region_exit, with the right value of i_n_c_r. Finally, in port_sleep_until_interrupt, if !sd_enabled, we really do need to __disable_irq, rather than using the common_hal_mcu routines; the reason why is documented in a comment.
In time, we should transition interrupt driven background tasks out of the overall run_background_tasks into distinct background callbacks, so that the number of checks that occur with each tick is reduced.
Testing performed: Played half of the Bartlebeats album :) :)
Before this, the mp3 file would be read into the in-memory buffer only when new samples were actually needed. This meant that the time to read mp3 content always counted against the ~22ms audio buffer length. Now, when there's at least 1 full disk block of free space in the input buffer, we can request that the buffer be filled _after_ returning from audiomp3_mp3file_get_buffer and actually filling the DMA pointers. In this way, the time taken for reading MP3 data from flash/SD is less likely to cause an underrun of audio DMA. The existing calls to fill the inbuf remain, but in most cases during streaming these become no-ops because the buffer will be over half full.
…to-refresh This is a step towards restoring the efficiency of the background tasks
CALLBACK_CRITICAL_BEGIN is heavyweight, but we can be confident we do not have work to do as long as callback_head is NULL. This gives back performance on nRF.
2056ff6
to
b445814
Compare
I think this is ready for testing and review. I have tested SAM D51, nRF52840, esp32s2, stm32f405 all at various points during this process and I'm no longer aware of any regressions. |
b445814
to
81105cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this careful work. The nrf changes make sense to me. @jepler and I went over them in some detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome cleanup! Thank you!
I believe there is one case you don't handle that is probably very rare and maybe not an issue. It is the case where you queue up background work like filling an mp3 buffer but delete the mp3 object before you do the background work. Think that is an issue?
A background callback must never outlive its related object. By collecting the head of the linked list of background tasks, this will not happen. One hypothetical case where this could happen is if an MP3Decoder is deleted while its callback to fill its buffer is scheduled.
@tannewt thanks for reminding me about that case. By making the callback list a GC root, this should not be possible. Please re-review. |
@jepler are all callbacks allocated on the heap or is it a mix? If it's a mix then the collect won't work because it'll stop at the first pointer off the heap I believe. Looks like uchip is 4 bytes too big now unfortunately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jepler and I discussed some things in discord, new code looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too! Excited to hear how it works for folks.
The "lower power" branch removed a previous optimization that allowed the background tasks to run just once every tick. Restore this, for things that run on the basis of time; and where possible move things to happening based on interrupts.
The foundation for this is a new linked list of background callbacks. A background callback takes a function pointer and an optional object pointer. Usually, such callbacks are registered from interrupts. For instance, on atmel sam, the audio DMA interrupt is used to schedule a fill of the new sample data, rather than polling a flag every 1ms.
Current state of this PR:
Testing performed: on pygamer and grand central m4, did general tire kicking of related code:
Benchmarks: On Grand Central M4, I ran the following code
The regression from 5.3.0 to 6.0.0.alpha.1 is almost entirely fixed. However, the difference between the original timing of 1.02s and the new timing of 1.04s seems "real" and is not within the usual variability of execution time. (that the above loop takes about 1s on samd51 is a total coincidence, I didn't choose the numbers specially or anything!)
What's next: