precache() - preload code into the flash cache. #6628

ChocolateFrogsNuts · 2019-10-10T13:55:48Z

By preloading code into the flash cache we can take control over when
SPI Flash reads will occur when code is executing.
This can be useful where the timing of a section of code is extremely
critical and we don't want random pauses to pull code in from the SPI
flash chip.

It can also be useful for code that accesses/uses SPI0 which is connected
to the flash chip.

Non interrupt handler code that is infrequently called but might otherwise
require being in valuable IRAM - such as bit-banging I/O code or some code
run at bootup can avoid being permanently in IRAM.

Macros are provided to make precaching one or more blocks of code in any
function easy.

By preloading code into the flash cache we can take control over when SPI Flash reads will occur when code is executing. This can be useful where the timing of a section of code is extremely critical and we don't want random pauses to pull code in from the SPI flash chip. It can also be useful for code that accesses/uses SPI0 which is connected to the flash chip. Non interrupt handler code that is infrequently called but might otherwise require being in valuable IRAM - such as bit-banging I/O code or some code run at bootup can avoid being permanently in IRAM. Macros are provided to make precaching one or more blocks of code in any function easy.

ChocolateFrogsNuts · 2019-10-10T13:59:59Z

forms part of the solution for #6559

d-a-v · 2019-10-10T14:48:20Z

(I cancelled your CI, which is broken anyway, to get CI resource for my CI-fixing PR)

ChocolateFrogsNuts · 2019-10-10T15:10:43Z

Guess that's what I get for being slack and pushing something without a test compile... it should be ok now that I added the missing include though.

d-a-v · 2019-10-10T15:34:04Z

CI itself was broken, I wasn't commenting on your PR 😆
You should press the update button now, CI is fixed

edit you just did it :)

ChocolateFrogsNuts · 2019-10-10T15:35:43Z

ahh, yes I thought the logs looked a bit odd :)

into precache

ChocolateFrogsNuts · 2019-10-11T08:36:55Z

This might need a little more work - I think it's not getting the length calculation right under some circumstances, so it probably needs another attribute

With certain alignments/lengths of code it was possible to not read enough into the flash cache. This commit makes the length calculation clearer and adds an extra cache line to ensure we precache enough code.

Precached code needs to be noinline to ensure the no-reorder-blocks is applied.

earlephilhower · 2019-10-13T19:04:41Z

There's one issue I see here, and I'm not sure how to handle it.

This CPU doesn't have 32b load insns, so it uses a .literal table and pc-relative load32 to read them.

GCC doesn't include the .literal section in the function proper. I've seen it place it both before and after the fcn (I think the linker might be involved with this, but haven't dug into it).

So I'm not sure in the general case this is doable. You need 32b access to do any constant address access (i.e. GPIO) or access any global variables/static class members, or do any operation (compare, etc.) with a 32b value...

ChocolateFrogsNuts · 2019-10-14T00:44:01Z

From what I've seen in the disassembled code, those literals can come from anywhere in the code although it does prefer to place them beside the function it seems to re-use nearby values. I don't think there's any way we can compensate for that in a transparent way. What we can do is handle it with documentation "access your constants before the precached code".
So far I haven't seen any problem with loading the addresses of the registers being accessed in the code I'm using, but perhaps that is due to the first accesses caching enough of the constants early enough.
The issue could be easily avoided in the end code by loading a base address into a variable, and accessing the GPIO/Registers/whatever as base+offset (or just loading all constants into variables).
Again documentation and the general rule of keeping your critical sections of code as short and minimal as possible.

ChocolateFrogsNuts · 2019-10-14T02:01:35Z

Further to the above...
I modified my code that uses this to access the SPI registers as base+offset - wasn't too hard, just a volatile variable for the base, and a macro to convert the register names to an offset.
That removed all the l32 instructions referencing flash and replaced them with l32 referencing a register+offset.
What I also noticed is that the function calls are a potential problem.
They are often (usually) done by loading the address of the function as a constant from flash, then a call. We may need to document "no function calls" although I haven't noticed a problem with memcpy, but that could be because memcpy is called frequently enough for it's address to be in cache.
Now that size is less important in my code, I might replace the memcpy calls with for loops in the SPI0 code just to be sure :-)

ChocolateFrogsNuts · 2019-10-14T08:25:13Z

Ha! It took some ninja level gymnastics with 'volatile' and function pointers but I got SPI_command() to build without any load32 instructions referring to flash in the precached code.
No changes required to this PR though.

One point for documentation though - only call functions in iram or rom. Basically same rules as an interrupt handler.

into precache

devyte · 2019-10-29T04:50:04Z

Closing in favor of #6674.

devyte · 2019-11-03T07:49:26Z

Reopening this for merge given that the rest of #6674 needs further discussion.

Fix missing include

acde951

Merge branch 'master' into precache

482310d

devyte self-requested a review October 10, 2019 21:32

ChocolateFrogsNuts added 3 commits October 11, 2019 14:09

Make precache extern "C"

e74736c

Merge branch 'precache' of https://github.com/ChocolateFrogsNuts/Arduino

fdbe442

into precache

Attempt 2 at making precache extern "C"

2cbdb41

ChocolateFrogsNuts added 2 commits October 11, 2019 17:39

Fix calculation of number of cache lines to preload

0a7862a

With certain alignments/lengths of code it was possible to not read enough into the flash cache. This commit makes the length calculation clearer and adds an extra cache line to ensure we precache enough code.

Add noinline to PRECACHE_ATTR macro

25321da

Precached code needs to be noinline to ensure the no-reorder-blocks is applied.

devyte approved these changes Oct 14, 2019

View reviewed changes

ChocolateFrogsNuts and others added 6 commits October 16, 2019 19:15

Merge branch 'master' into precache

76a8030

Merge branch 'master' into precache

3e3facc

Merge branch 'master' into precache

94e4b70

Merge branch 'precache' of https://github.com/ChocolateFrogsNuts/Arduino

94c833a

into precache

Merge branch 'master' into precache

af3e435

Merge branch 'master' into precache

15e459c

devyte approved these changes Oct 28, 2019

View reviewed changes

Merge branch 'master' into precache

6aee96b

devyte mentioned this pull request Oct 28, 2019

Spi0command #6674

Merged

devyte closed this Oct 29, 2019

devyte reopened this Nov 3, 2019

Merge branch 'master' into precache

da976ce

devyte merged commit 692e542 into esp8266:master Nov 3, 2019

OttoWinter mentioned this pull request Nov 9, 2019

ESP8266 Arduino Core v2.6.0 esphome/feature-requests#471

Closed

19 tasks

mcspr mentioned this pull request Apr 2, 2024

Add option to force I2S routines into IRAM #9115

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

precache() - preload code into the flash cache. #6628

precache() - preload code into the flash cache. #6628

Uh oh!

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

d-a-v commented Oct 10, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

d-a-v commented Oct 10, 2019 •

edited

Loading

Uh oh!

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 11, 2019

Uh oh!

earlephilhower commented Oct 13, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 14, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 14, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 14, 2019

Uh oh!

devyte commented Oct 29, 2019

Uh oh!

devyte commented Nov 3, 2019

Uh oh!

Uh oh!

precache() - preload code into the flash cache. #6628

precache() - preload code into the flash cache. #6628

Uh oh!

Conversation

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

d-a-v commented Oct 10, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

d-a-v commented Oct 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChocolateFrogsNuts commented Oct 10, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 11, 2019

Uh oh!

earlephilhower commented Oct 13, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 14, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 14, 2019

Uh oh!

ChocolateFrogsNuts commented Oct 14, 2019

Uh oh!

devyte commented Oct 29, 2019

Uh oh!

devyte commented Nov 3, 2019

Uh oh!

Uh oh!

d-a-v commented Oct 10, 2019 •

edited

Loading