-
Notifications
You must be signed in to change notification settings - Fork 854
Found a bug which causes tx_byte_allocate() not thread-safe when use -O3 optimization!! #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, |
Historically, GCC -O3 optimization did not produce reliable code. I would not recommend using -O3 until/unless the entire ThreadX source is built with -O3 and passes all verification tests. That said, it would be useful to issue a pull request for your proposed change so it can be evaluated further. |
This is very cool bug. |
This is indeed an interesting issue. I like using volatile more than
calling a dummy function - just from the perspective of performance. In
your fix, did you just add volatile to the pool_ptr API parameter?
…On Sun, Feb 4, 2024 at 12:12 AM amgross ***@***.***> wrote:
This is very cool bug.
Probably calling dummy function from other c file will resolve it as the
compiler now can't know if the global changed.
Other approach will be to define pool_ptr to be pointer to volatile.
What do you think about it @yuhiping <https://github.com/yuhiping> ?
—
Reply to this email directly, view it on GitHub
<#334 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3YCNAN3HI6MPUDD6JQ3YUDYR47GLAVCNFSM6AAAAABA2ZUUZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGYZDKNRZGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I didn't tried the fix, but indeed adding the volatile to the parameter is what I thought |
@amgross @williamelamie Sorry for the late reply. Yes, as you suggested, adding the volatile to the parameter was just our solution. Moreover, Is it still necessary to issue a pull request ? |
Hey @williamelamie I actually don't think this is the right solution, it exposes the underlying implementation via a public interface and is not very localized. Personally I think the function should make local copies while appropriately managing variable properties (eg: volatile). This doesn't address potentially issues that may arise in SMP, for example a barrier might be needed in SMP designs and the use of VOLATILE for the function prototype won't solve that. I think a cleaner alternative to keep the changes localized to this function are create a local variable that is volatile and keep the function signature the same I duplicated this with the same compiler version compiling for M33 (gcc 6.2) as well as GCC 8. The below 2 line solution fixes the issue for me and we don't need to change the public API now. Note that this bug is also present with -O2 as well. With optimizing for size (-Os) or debug (-Og), which are the flags most embedded developers use, the bug is not present. This is actually a pretty nasty little problem... Here's my fix
original assembly
and using the local volatile it's fixed
|
I agree Pat. It is nicer to have the use of volatile hidden from the outside. |
Another option would be to make the tx_byte_pool_owner structure member in TX_BYTE_POOL volatile, like the following:
This is a slightly smaller change and something that might be good to apply to the head of each suspension list, since they are used by the tx_*_prioritize functions in a similar manner as tx_byte_pool_owner structure. |
Is it possible that this is an issue with the cpu-specific port? #define TX_RESTORE asm volatile (" MSR CPSR_c,%0 "::"r" (interrupt_save) ); Other implementations, like the Cortex-M4, have a memory-clobber in the inline-assembly that restores the interrupt-posture: __asm__ volatile ("MSR PRIMASK,%0": : "r" (int_posture): "memory"); I can't test it right now, but I think it would be interesting to check if adding a memory-clobber in the Cortex-R5 port would fix this. My understanding is that it should prevent the compiler from reusing the cached value from the register. |
Nice observation. Yes, you found the issue, you are correct, the clobbers are wrong and the compiler fence is missing. I'm using BASEPRI for Cortex M33/M4 instead of CPSID, and there's actually a bug in threadX BASEPRI support that triggers the same issue as Cortex R5
Note the missing clobber, so no compiler fence is inserted. For non-BASEPRI approach (CPSID)
The barrier is present. So the barrier is there for approach using CPSID , but if you use BASEPRI, the memory fence/barrier is missing. The fix for BASEPRI (verified) is:
This fixes the issues for Cortex M33/M4 using BASEPRI. For anyone not using BASEPRI, this won't be an issue on Cortex M33/M4 as the non-BASEPRI path correctly inserts a compiler barrier. For Cortex R5, TX_DISABLE is defined as
This is incorrect, the fence is missing. It should be
Summary:
With the barriers in place, the problem will be gone. This is actually a big problem, these missing fences can trigger some really hard to find issues. |
What with ARC EM, do it need also to be changed? threadx/ports/arc_em/metaware/inc/tx_port.h Lines 306 to 307 in 485a02f
|
Looks like it's using a compiler built_in, the compiler built_in's will take care of the memory fences for you.
|
This commit fixes thread safety issue in tx_byte_allocate by correcting clobber list and adding missing compiler fence. This addresses potential race conditions when aggressive compiler optimization is enabled. Patched by Pat Kusbel. See eclipse-threadx#334
This commit fixes thread safety issue in tx_byte_allocate by correcting clobber list and adding missing compiler fence. This addresses potential race conditions when aggressive compiler optimization is enabled. Patched by Pat Kusbel. See eclipse-threadx#334
This commit fixes thread safety issue in tx_byte_allocate by correcting clobber list and adding missing compiler fence. This addresses potential race conditions when aggressive compiler optimization is enabled. See eclipse-threadx#334
The way we use byte pool
We use byte pool as heap in our project, the detail code in our project was copied from the following link which implemets CMSIS API
for threadx.
https://github.com/STMicroelectronics/stm32_mw_cmsis_rtos_tx/blob/main/cmsis_os2.c
In cmsis_os2.c, byte pool is created in MemInit() and after that, we can just use MemAlloc() and MemFree() to request and release memorry.
TX_BYTE_POOL HeapBytePool;
TX_BYTE_POOL StackBytePool;
TX_BLOCK_POOL BlockPool;
What happened?
MemAlloc() and MemFree() just worked fine at most of the time. But occationally, MemAlloc() failed with code of TX_NO_MEMORY . While, when we looked into this issue and after debugged a lot, we found that there was still enough memorry but the alloc algorithm not worked well.
Why this happened?
After days of debugging, we found that the value of tx_byte_pool_fragments went wrong when this issue happen.
To prove this, we defined a global varible of the same type, and do the same operation when tx_byte_pool_fragments increased or decreased. This is simple because tx_byte_pool_fragments only changes in the function _tx_byte_pool_search() which defined in threadx/threadx/common/src/tx_byte_pool_search.c.
It turned out that, when MemAlloc() failed with code of TX_NO_MEMORY , the value of tx_byte_pool_fragments in struct TX_BYTE_POOL was just different with the global varible we defined.
The root cause
After a deep dive into the source code of threadx, we found that _tx_byte_pool_search() can be interrupted after TX_RESTORE at line 258. This is a good design to let taskes with high priority alloc from the pool at first. But as mentioned above, The control block or so called handle was defined as global varibles, which may be changed by another high priority task.
Further explanation
Further proof
The following is the asm code of _tx_byte_pool_search(), when tx_byte_pool_fragments increases or decreases, it uses the value
from the register LR instead of re-load the value from ram which may be changed already.
The asm code may be different for the reason of different gcc version or MCU, in our case , the MCU is Cortex-R5 based and gcc version is arm-none-eabi-gcc 6.2.1 .
Looking forward for your reply!
The text was updated successfully, but these errors were encountered: