-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add DWC2 cache maintenance routines for STM32 #2963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
7b6b813
to
adb3c80
Compare
Looks like my HIL instance has license issue, I think we can add the env locally. |
Works great for me, both for CDC and UVC. Thanks ! In addition to |
Thanks for your test. |
Thanks @HiFiPhile for great Pr as usual. Though I am off for TET (Lunar New Year) and won't be able to review this in 2 weeks. Happy New Year 🎉 |
I don't think so. I'm using a custom RTOS which relies on its own set of headers, that's why. |
Happy new year also 🎊 |
d349c12
to
6c9a081
Compare
{ | ||
__NONCACHEABLEBUFFER_BEGIN = .;/* create symbol for start of section */ | ||
KEEP(*(noncacheable)) | ||
__NONCACHEABLEBUFFER_END = .; /* create symbol for start of section */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is wrong. It should be end of section
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, it's copied from STM32CubeH7RS.
Signed-off-by: HiFiPhile <[email protected]>
@hathach I just rebased on master and did some cleanup. |
How about adding the CFG_TUD_MEM_ALIGN_BYTE/SIZE instead, CFG_TUD_MEM_ALIGN if not defined will be defined using that. That way people will have time to migrate (or just leave it if they don't need it). Btw, why you need the numeric value (haven't do review yet). |
Yeah it's a good idea. I need to align the buffer correctly if user need a higher alignment than cache line size. Update: no more need to change |
21282bb
to
15f32f2
Compare
Signed-off-by: HiFiPhile <[email protected]>
Signed-off-by: HiFiPhile <[email protected]>
Signed-off-by: HiFiPhile <[email protected]>
Signed-off-by: HiFiPhile <[email protected]>
@@ -196,6 +197,11 @@ SECTIONS | |||
. = ALIGN(8); | |||
} >DTCM | |||
|
|||
.dtcm_data : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this, and use the noncacheable_buffer
instead, I like to keep thing closed to stock stm32 linker as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's possible, MPU config need to be added to BSP to make noncacheable_buffer
really non-cached. And I forgot this when I enable DCache for F7 & H7...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, does the current code configure this section as non-cacheable, while noncacheable_buffer section need extra works, or both of them need to be configured the same way ?
if removing this requires extra works, then we can keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtcm_data
is only used by RTT to have a non-cached buffer, DWC2 doesn't relied on this.
I've added SCB_EnableDCache
for benchmark only, to keep it simple we can also keep the DCache disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is possible to use the default noncacheable_buffer section instead of adding new one. That would allow us to use the stock IAR linker as well ? You may want to pull first, since, I push the fix for the linker issue with clang (need to have memory value before provide, and drop the READONLY keywords)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is possible to use the default noncacheable_buffer section instead of adding new one. That would allow us to use the stock IAR linker as well ?
I think so, hope MPU config generated by Cube support both compilers. But the default size of 0x400 is too small for RTT buffer.
src/common/tusb_types.h
Outdated
@@ -36,39 +36,39 @@ | |||
#endif | |||
|
|||
//------------- Device DCache declaration -------------// | |||
#define TUD_EPBUF_DCACHE_SIZE(_size) (CFG_TUD_MEM_DCACHE_ENABLE ? \ | |||
#define TUD_EPBUF_DCACHE_SIZE(_size) (TUD_EPBUF_DCACHE_ALIGNED ? \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we need to replace CFG_TUD_MEM_DCACHE_ENABLE by TUD_EPBUF_DCACHE_ALIGNED, DCACHE when enabled is certainly required the memory to be in cache line alginment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is to control DCache alignment not only by CFG_TUD_MEM_DCACHE_ENABLE
but by the necessity:
-
I set
CFG_TUD_MEM_DCACHE_ENABLE_DEFAULT=1
for MCUs with internal cache soCFG_TUD_MEM_DCACHE_ENABLE
doesn't need to be set by the user, especially during family migration it could be forgot. -
For DWC2 DCache alignment is only needed when DMA is enabled, without
TUD_EPBUF_DCACHE_ALIGNED
it will be aligned by default even not needed. -
User imposed alignment
CFG_TUD_MEM_ALIGN
always works since the alignment ofTUD_EPBUF_DEF
union is determined by the most strict member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point to introduce TUD_EPBUF_DCACHE_ALIGNED
macro, the existing CFG_TUD_MEM_DCACHE_ENABLE and CFG_TUD_MEM_DCACHE_ENABLE_DEFAULT should be good enough, the default value of dwc2 can be CFG_TUD_DWC2_DMA_ENABLE (same as P4). The setting is rather comprehensive already .
#define CFG_TUD_MEM_DCACHE_ENABLE_DEFAULT CFG_TUD_DWC2_DMA_ENABLE
#define CFG_TUH_MEM_DCACHE_ENABLE_DEFAULT CFG_TUH_DWC2_DMA_ENABLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, let me keep it simple.
Signed-off-by: HiFiPhile <[email protected]>
I've updated BSP:
|
Signed-off-by: HiFiPhile <[email protected]>
Describe the PR
Now
#define CFG_TUD_DWC2_DMA_ENABLE 1
is enough.Rebased on master and cleaned up.
It's prefer to declare a non-cached region with MPU instead of rely on cache invalidate+clean, benchmark on STM32H7S3 and i.MX RT1170 shows frequent cache invalidate+clean really hurts performance.
Benchmark code