Skip to content

Use a linked list of background tasks to perform #2879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 17, 2020

Conversation

jepler
Copy link

@jepler jepler commented May 11, 2020

The "lower power" branch removed a previous optimization that allowed the background tasks to run just once every tick. Restore this, for things that run on the basis of time; and where possible move things to happening based on interrupts.

The foundation for this is a new linked list of background callbacks. A background callback takes a function pointer and an optional object pointer. Usually, such callbacks are registered from interrupts. For instance, on atmel sam, the audio DMA interrupt is used to schedule a fill of the new sample data, rather than polling a flag every 1ms.

Current state of this PR:

  • Builds on a range of boards
  • transitions all boards back to running background tasks just once per ms, though only when ticks are enabled
  • transitions sam d20/d5x/e5x to using interrupts for audio and usb

Testing performed: on pygamer and grand central m4, did general tire kicking of related code:

  • audio code
  • usb
  • filesystem
  • eink display
  • lcd display
  • gamepad

Benchmarks: On Grand Central M4, I ran the following code

import time
t0 = time.monotonic_ns()
k = 0
for i in range(1000):
    for j in range(100):
        k += j
t1 = time.monotonic_ns()
print(k)
print((t1-t0) / 1e9)

The regression from 5.3.0 to 6.0.0.alpha.1 is almost entirely fixed. However, the difference between the original timing of 1.02s and the new timing of 1.04s seems "real" and is not within the usual variability of execution time. (that the above loop takes about 1s on samd51 is a total coincidence, I didn't choose the numbers specially or anything!)

version time
5.3.0 1.02s
6.0.0.a1 1.54s
this PR 1.04s

What's next:

  • test & benchmark more boards
  • get a green from Actions
  • possible strategic conversion of more stuff to interrupt-driven

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! My brain will start thinking through this approach. It's not what I was thinking which means it is probably better. :-)

@tannewt tannewt changed the base branch from master to main June 9, 2020 20:05
@jepler
Copy link
Author

jepler commented Jun 25, 2020

Discussion with @tannewt

  • go ahead with this approach
  • make sure that the case of background callbacks becoming invalid during GC heap reset

Which modules to do next

  • the rest of audio (it's "the hardest")
  • displayio second
  • anything that has open issues about it not working in main

@jepler jepler force-pushed the background-callback branch from bbd9436 to 7944f50 Compare July 9, 2020 16:33
@jepler jepler changed the title Add, demonstrate use of new background task list functionality Use a linked list of background tasks to perform Jul 9, 2020
@jepler
Copy link
Author

jepler commented Jul 9, 2020

The initial comment has been heavily revised to reflect the current status of this PR. Things are progressing nicely. SAM D5x/E5x is probably done, though as always there may be problems that only become obvious after testing, such as in the next alpha release.

@jepler
Copy link
Author

jepler commented Jul 9, 2020

The performance gained back on the CLUE is much smaller... but it has not been given specific attention yet.

version time
5.3.0 2.65s
6.0.0.a1 4.66s
this PR 4.112s

@jepler
Copy link
Author

jepler commented Jul 10, 2020

With the latest changes performance on nRF is a bit better than the 5.3.0 baseline (now testing on Particle Xenon).

version time
5.3.0 2.24s
6.0.0.a1 3.04s
this PR 2.22s

I suspect the 5.3.0 performance of the Xenon is better than the CLUE because the latter has a displayio display enabled. However, I didn't test the theory.

jepler added 8 commits July 15, 2020 09:26
The motivation for doing this is so that we can allow
common_hal_mcu_disable_interrupts in IRQ context, something that works
on other ports, but not on nRF with SD enabled.  This is because
when SD is enabled, calling sd_softdevice_is_enabled in the context
of an interrupt with priority 2 or 3 causes a HardFault.  We have chosen
to give the USB interrupt priority 2 on nRF, the highest priority that
is compatible with SD.

Since at least SoftDevice s130 v2.0.1, sd_nvic_critical_region_enter/exit
have been implemented as inline functions and are safe to call even if
softdevice is not enabled.  Reference kindly provided by danh:
 https://devzone.nordicsemi.com/f/nordic-q-a/29553/sd_nvic_critical_region_enter-exit-missing-in-s130-v2

Switching to these as the default/only way to enable/disable interrupts
simplifies things, and fixes several problems and potential problems:
 * Interrupts at priority 2 or 3 could not call common_hal_mcu_disable_interrupts
   because the call to sd_softdevice_is_enabled would HardFault
 * Hypothetically, the state of sd_softdevice_is_enabled
   could change from the disable to the enable call, meaning the calls
   would not match (__disable_irq() could be balanced with
   sd_nvic_critical_region_exit).

This also fixes a problem I believe would exist if disable() were called
twice when SD is enabled.  There is a single "is_nested_critical_region"
flag, and the second call would set it to 1.  Both of the enable()
calls that followed would call critical_region_exit(1), and interrupts
would not properly be reenabled.  In the new version of the code,
we use our own nesting_count value to track the intended state, so
now nested disable()s only call critical_region_enter() once, only
updating is_nested_critical_region once; and only the second enable()
call will call critical_region_exit, with the right value of i_n_c_r.

Finally, in port_sleep_until_interrupt, if !sd_enabled, we really do
need to __disable_irq, rather than using the common_hal_mcu routines;
the reason why is documented in a comment.
In time, we should transition interrupt driven background tasks out of the
overall run_background_tasks into distinct background callbacks,
so that the number of checks that occur with each tick is reduced.
Testing performed: Played half of the Bartlebeats album :) :)
Before this, the mp3 file would be read into the in-memory buffer
only when new samples were actually needed.  This meant that the time
to read mp3 content always counted against the ~22ms audio buffer length.

Now, when there's at least 1 full disk block of free space in the input
buffer, we can request that the buffer be filled _after_ returning from
audiomp3_mp3file_get_buffer and actually filling the DMA pointers.  In
this way, the time taken for reading MP3 data from flash/SD is less
likely to cause an underrun of audio DMA.

The existing calls to fill the inbuf remain, but in most cases during
streaming these become no-ops because the buffer will be over half full.
…to-refresh

This is a step towards restoring the efficiency of the background
tasks
CALLBACK_CRITICAL_BEGIN is heavyweight, but we can be confident we do
not have work to do as long as callback_head is NULL.

This gives back performance on nRF.
@jepler jepler force-pushed the background-callback branch from 2056ff6 to b445814 Compare July 15, 2020 14:33
@jepler jepler requested a review from tannewt July 15, 2020 16:55
@jepler jepler marked this pull request as ready for review July 15, 2020 16:55
@jepler
Copy link
Author

jepler commented Jul 15, 2020

I think this is ready for testing and review. I have tested SAM D51, nRF52840, esp32s2, stm32f405 all at various points during this process and I'm no longer aware of any regressions.

@jepler jepler force-pushed the background-callback branch from b445814 to 81105cb Compare July 15, 2020 16:58
dhalbert
dhalbert previously approved these changes Jul 16, 2020
Copy link
Collaborator

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this careful work. The nrf changes make sense to me. @jepler and I went over them in some detail.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome cleanup! Thank you!

I believe there is one case you don't handle that is probably very rare and maybe not an issue. It is the case where you queue up background work like filling an mp3 buffer but delete the mp3 object before you do the background work. Think that is an issue?

A background callback must never outlive its related object.  By
collecting the head of the linked list of background tasks, this will
not happen.

One hypothetical case where this could happen is if an MP3Decoder is
deleted while its callback to fill its buffer is scheduled.
@jepler
Copy link
Author

jepler commented Jul 17, 2020

@tannewt thanks for reminding me about that case. By making the callback list a GC root, this should not be possible. Please re-review.

@jepler jepler requested review from tannewt and dhalbert July 17, 2020 13:37
@tannewt
Copy link
Member

tannewt commented Jul 17, 2020

@jepler are all callbacks allocated on the heap or is it a mix? If it's a mix then the collect won't work because it'll stop at the first pointer off the heap I believe.

Looks like uchip is 4 bytes too big now unfortunately.

Copy link
Collaborator

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jepler and I discussed some things in discord, new code looks good to me.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too! Excited to hear how it works for folks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants