Increase number of LWIP timers for MDNS (fixes #7333) #7589
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The proximate cause of this issue here, where this memory allocation fails.
This raises two questions:
Why did the memory allocation fail?
LWIP uses statically sized arrays from which it allocates each type of struct. In this case we exhausted the static array for the timeout struct. The size of the array is determined by
MEMP_NUM_SYS_TIMEOUT
which has some preprocessor magic to calculate the maximum number of allocations needed, so we should never run out. However this macro only knows about stuff in lwip/core; it doesn't know about lwip/apps.From lwip/apps, we use mdns. In the code for mdns, there is this well hidden comment:
So we need to manually increase the value of MEMP_NUM_SYS_TIMEOUT in lwip/core from 7 to 8 and unfortunately we can't use the magic macro calculation anymore.
But it doesn't end there! A different mdns file, allocates a further 3 timers per IP version. And unlike the first file, there doesn't appear to be a comment to increase MEMP_NUM_SYS_TIMEOUT to compensate.
So that is how the new value of MEMP_NUM_SYS_TIMEOUT is calculated and it is sufficient to fix the issue.
Why wasn't the failure more fatal?
When the memory allocation fails, the code calls
LWIP_ASSERT
, which in turn callsLWIP_PLATFORM_ASSERT
, which is defined here:The problem is that the parameter,
x
, toLWIP_PLATFORM_ASSERT
is not the assert predicate, but the assert message. The message string constant is statically non-null, so this whole macro simplifies into a no-op instead of an infinite loop. This means the assert after the memory allocation failure does nothing and the program continues normally instead of halting, which would be appropriate for this non-recoverable error.The fix for this is of course trivial; it's not necessary to fix #7333, but it is useful to catch similar bugs earlier. Currently Adafruit does not maintain its own fork of pico-sdk, so I will submit the PR to Raspberry Pi. Unless someone thinks it is worthwhile to create the fork.