Skip to content

nrf5x: poll isr for dma status if irqs are disabled #396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

xobs
Copy link
Collaborator

@xobs xobs commented May 8, 2020

The tud_poll() function is used to clear the tinyusb buffers and
respond to various events in the main thread.

This is important in order to properly utilise the wfi instruction.

When an interrupt appears, the appropriate handler is called and
tinyusb polls it in a "top half" function. It then adds events to
a queue which will be handled in a "bottom half" function in the
main thread.

If an interrupt comes in between running the "bottom half" function
and issuing a "wfi", then there will be stale data in the queue
and the host will be left with potentially unanswered data.

In order to avoid this scenario, the following sequence of events
needs to happen:

  1. Disable interrupts
  2. Flush the queue by calling tud_task()
  3. Issue WFI
  4. Enable interrupts

If an interrupt comes in between (2) and (3), the WFI will return
immediately even though interrupts are disabled.

Because of this, tud_task() needs to be able to run without
interrupts.

Modify the nrf52 code so that it polls the ISR if interrupts are not
enabled. This prevents a deadlock waiting for DMA to finish, ensuring
this can be run with interrupts disabled.

@xobs
Copy link
Collaborator Author

xobs commented May 8, 2020

I'm not sure if this is the proper approach, but I do think that tud_task() shouldn't block indefinitely when interrupts are disabled.

The `tud_poll()` function is used to clear the tinyusb buffers and
respond to various events in the main thread.

This is important in order to properly utilise the `wfi` instruction.

When an interrupt appears, the appropriate handler is called and
tinyusb polls it in a "top half" function.  It then adds events to
a queue which will be handled in a "bottom half" function in the
main thread.

If an interrupt comes in between running the "bottom half" function
and issuing a "wfi", then there will be stale data in the queue
and the host will be left with potentially unanswered data.

In order to avoid this scenario, the following sequence of events
needs to happen:

    1. Disable interrupts
    2. Flush the queue by calling `tud_task()`
    3. Issue WFI
    4. Enable interrupts

If an interrupt comes in between (2) and (3), the WFI will return
immediately even though interrupts are disabled.

Because of this, `tud_poll()` needs to be able to run without
interrupts.

Modify the nrf52 code so that it polls the ISR if interrupts are not
enabled.  This prevents a deadlock waiting for DMA to finish, ensuring
this can be run with interrupts disabled.

Signed-off-by: Sean Cross <[email protected]>
@hathach
Copy link
Owner

hathach commented May 8, 2020

hmm, this is very neat. I didn't know that it could be an issue with DMA. Nordic insist software driver must make sure there is only one DMA enabled at a time, else something bad out of specs happens. I did it by using tud task as a way to deferred ISR call, waiting for DMA complete to make sure there is no race condition between endpoints.

This is rather annoying, give me a day or two, I will try to mimic the WFI on my repo example (since it is simpler and easier to troubleshoot that cpy) to have a full understanding of the issue (is currently busy with bootloader work).

@xobs
Copy link
Collaborator Author

xobs commented May 8, 2020

By the way, with this patch, coupled with adafruit/circuitpython#2868, this board is drawing about 10 uA during sleep. It's able to be reliably plugged in and unplugged from USB, it can log to internal flash, and it can generally operate as you'd expect. I think that something like this will be very important when operating in low power modes.

while ( _dcd.dma_running ) { }
while ( _dcd.dma_running ) {
// Manually poll the handler if interrupts are disabled
if ( __get_PRIMASK() )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to check the USB IRQ status here too? Otherwise other interrupts will cause dcd_int_handler to run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This states that it will only poll the USB ISR when all interrupts are disabled. __disable_irq() works by setting PRIMASK to 1. This simply polls the ISR in a tight loop, regardless of whether the USB IRQ is enabled.

It doesn't care whether other interrupts are enabled, and it doesn't care if there's a pending interurpt or not. It will still poll regardless.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, there shouldn't be issue with this block waiting, I add a bit of logging and don't see it ever wait or defer the dma

// helper to start DMA
static void edpt_dma_start(volatile uint32_t* reg_startep)
{

  // Only one dma can be active
  if ( _dcd.dma_running )
  {
    if (SCB->ICSR & SCB_ICSR_VECTACTIVE_Msk)
    {
        TU_LOG2("dma deferred\r\n");
      // If called within ISR, use usbd task to defer later
      usbd_defer_func( (osal_task_func_t) edpt_dma_start, (void*) reg_startep, true );
      return;
    }
    else
    {
        TU_LOG2("dma waiting\r\n");
      // Otherwise simply block wait
      while ( _dcd.dma_running )  { }
    }
  }

  _dcd.dma_running = true;
  TU_LOG2("dma started\r\n");

  (*reg_startep) = 1;
  __ISB(); __DSB();
}

// DMA is complete
static void edpt_dma_end(void)
{
  TU_ASSERT(_dcd.dma_running, );
  _dcd.dma_running = false;

  TU_LOG2("dma ended\r\n");
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that _dcd.dma_running is set by the interrupt handler, and if you disable interrupts then _dcd.dma_running effectively becomes 1, meaning it will never exit.

@xobs
Copy link
Collaborator Author

xobs commented May 13, 2020

What is the status of this patch? Have you had a chance to replicate the issue, or to come up with another approach to preventing deadlocks in tud_task() when interrupts are disabled?

@hathach
Copy link
Owner

hathach commented May 13, 2020

@xobs hi, I was buy with nrf bootloader work and haven’t tried it out. I will do as soon as I could and let you know.

@hathach
Copy link
Owner

hathach commented May 18, 2020

@xobs I will start to work on this issue tomorrow, I am lacking behind this. I checked your issue adafruit/circuitpython#2868 but still didn't see steps on how to reproduce it. Also where is the tud_poll() 🤔 🤔

@xobs
Copy link
Collaborator Author

xobs commented May 19, 2020

The steps (in Circuit Python) are:

  1. Run import time; time.sleep(60)
  2. Send some data

I'm not sure what tud_poll() is, I don't see that function defined at all.

You ought to be able to replicate this by examining the _usbd_q object prior to issuing a WFI. Try this:

  1. Call tud_task() to drain the buffer
  2. Do some processing. Maybe just sleep for 5000 ms. During this period, send some data.
  3. Examine _usbd_q.ff.wr_idx and _usbd_q.ff.rd_idx
  4. Issue WFI
  5. Loop to 1, if the host hasn't timed out the device yet.

If _usbd_q.ff.wr_idx != _usbd_q.ff.rd_idx in step 3, then the host will time out the device since the device will never respond.

@hathach
Copy link
Owner

hathach commented May 19, 2020

The steps (in Circuit Python) are:

  1. Run import time; time.sleep(60)
  2. Send some data

Ah I see, I could reproduce the issue pasting these to code.py

I'm not sure what tud_poll() is, I don't see that function defined at all.

hmm, it is mentioned in this PR description.

You ought to be able to replicate this by examining the _usbd_q object prior to issuing a WFI. Try this:

  1. Call tud_task() to drain the buffer
  2. Do some processing. Maybe just sleep for 5000 ms. During this period, send some data.
  3. Examine _usbd_q.ff.wr_idx and _usbd_q.ff.rd_idx
  4. Issue WFI
  5. Loop to 1, if the host hasn't timed out the device yet.

If _usbd_q.ff.wr_idx != _usbd_q.ff.rd_idx in step 3, then the host will time out the device since the device will never respond.

That is lots of steps to do, I am not familiar with gdb, I use segger ozone for debugging, though it seems to have some issue running current cpython (always hit hardfault). I try to find a way to troubleshoot it.

UPDATE: it seems like the enumeration is still complete, the sleep(60) significantly slow USB down.

@xobs
Copy link
Collaborator Author

xobs commented May 19, 2020

Yes, it will finish enumerating when the board wakes up. Provided it hasn't timed out by then. You also might not be able to issue "Ctrl-C", but I can't recall if that was one of the symptoms.

I'm not familiar with Segger Ozone, but I do know that J-Link GDB Server works. The issue I kept running into there was that their tools seem to assume you want to start at _start, but the boards really want to start at 0, so with J-Link GDB Server I always had to do the following:

$ make BOARD=simmel
$ gdb -ex 'tar rem :2331' build-simmel/firmware.elf
(gdb) make BOARD=simmel
(gdb) load
(gdb) mon reset 0
(gdb) c

I'm not sure how to tell Ozone to set $pc to 0.

@hathach
Copy link
Owner

hathach commented May 19, 2020

I added the timestamp before and after the WFI (while the code.py does sleep(50) ).

    if (sd_enabled) {
        sd_app_evt_wait();
    } else {
        // Call wait for interrupt ourselves if the SD isn't enabled.
        log_rtt_printf("%ld: WFI start\r\n", supervisor_ticks_ms32());
        __WFI();
        log_rtt_printf("%ld: WFI end\r\n", supervisor_ticks_ms32());
    }

the log is printed via SEGGER RTT therefore it is less interfere with the system. nRF does many full sleep cycles while enumerating. Possibly due to the

  1. Interrupt part of USB is complete and submitted to the event queue
  2. WFI happens
  3. Since tud_task() isn't called yet to response, there is no more USB event. CPU has a good full sleep of 50 seconds.
  4. WFI end, tud_task() called to handle usb events

It seems to be the racing between usb isr -> event queue -> tud_task() handling and WFI. More investigating ....

USBD init
CDC init
MSC init
HID init
MIDI init
USBD Bus Reset
813: WFI start
906: WFI end
USBD Bus Reset
906: WFI start
1034: WFI end
USBD Setup Received 80 06 00 01 00 00 40 00 
  Get Descriptor Device
  Queue EP 80 with 18 bytes
dma started
dma ended
1034: WFI start
6145: WFI end <---- **full sleep**
USBD Xfer Complete on EP 80 with 18 bytes
  Queue EP 00 with zlp Status
dma started
dma ended
USBD Bus Reset
USBD Xfer Complete on EP 00 with 0 bytes
6146: WFI start
6273: WFI end
6273: WFI start
6293: WFI end
USBD Setup Received 80 06 00 01 00 00 12 00 
  Get Descriptor Device
  Queue EP 80 with 18 bytes
dma started
dma ended
6293: WFI start
11521: WFI end
USBD Xfer Complete on EP 80 with 18 bytes
  Queue EP 00 with zlp Status
dma started
dma ended
USBD Bus Reset
USBD Xfer Complete on EP 00 with 0 bytes
11522: WFI start
11649: WFI end
USBD Setup Received 80 06 00 01 00 00 40 00 
  Get Descriptor Device
  Queue EP 80 with 18 bytes
dma started
dma ended
11650: WFI start
16897: WFI end
USBD Xfer Complete on EP 80 with 18 bytes
  Queue EP 00 with zlp Status
dma started
dma ended

@xobs
Copy link
Collaborator Author

xobs commented May 19, 2020

Correct. Note that WFI will return as long as an interrupt is pending. That's why adafruit/circuitpython#2868 disables interrupts, calls background tasks (including tud_task()), then issues WFI -- if a second packet comes in between the handling of tud_task() and WFI, then WFI will resume immediately.

The patch here just makes it so that tud_task() can be called with interrupts disabled.

@hathach
Copy link
Owner

hathach commented May 19, 2020

Yeah, though I am not sure if it is needed to call dcd_int_handler() in non-isr context as this PR suggest though. I have a feeling we don't have to run background task when irq is disabled and right before WFI(). It doesn't seem to be the norm, though I am not so sure, I will do more tests.

@xobs
Copy link
Collaborator Author

xobs commented May 20, 2020

This is obsoleted by the inclusion of tud_task_event_ready(). However, if it is considered a bug that calling tud_task() with interrupts disabled can hang the system, then this patch should still be considered for inclusion.

@hathach
Copy link
Owner

hathach commented May 20, 2020

This is obsoleted by the inclusion of tud_task_event_ready(). However, if it is considered a bug that calling tud_task() with interrupts disabled can hang the system, then this patch should still be considered for inclusion.

Yeah, you are right on this, Though we probably only only need to check DMA Start/End instead of invoking the dcd_int_handler(). I am looking at the nrf5x specs just now. This approach should be enough to get out of hanging situation.

@hathach
Copy link
Owner

hathach commented May 22, 2020

closed in prefer of #416, thank you very much for bringing the issue up

@hathach hathach closed this May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants