Add movable supervisor allocations #3695

cwalther · 2020-11-15T19:54:04Z

This allows calls to allocate_memory() while the VM is running, it will then allocate from the GC heap (unless there is a suitable hole among the supervisor allocations), and when the VM exits and the GC heap is freed, the allocation will be moved to the bottom of the former GC heap and transformed into a proper supervisor allocation. Existing movable allocations will also be moved to defragment the supervisor heap and ensure that the next VM run gets as much memory as possible for the GC heap.

Allocations embedded in the GC heap are tracked using the same supervisor_allocation struct as proper supervisor allocations, transparently to clients. Clients can deal with moving memory either by holding onto their supervisor_allocation (which stays valid across the move) and getting its ptr (which changes) every time they need to dereference it, or if they held onto the ptr directly, get notified using a callback when it changes. The only time moves happen is at the end of a VM run.

This unifies a couple of existing ad-hoc mechanisms doing similar things with the terminal tilegrid, Sharp display framebuffer, and RGBMatrix, and provides the infrastructure for future applications of saving information from one VM run into the next, such as which file to run next (#1084 + #3454), exception traceback from the previous run (#1084, PR forthcoming), and USB descriptors (#1015).

Rationale for the internal design:

When also tracking embedded GC allocations, the order in the array of supervisor_allocations can no longer match the order in memory, as GC allocations can come in any order. The order, still needed by the regular supervisor heap administration, therefore needs to be tracked in some other way. Let’s use a linked list (or two, one for the low and one for the high side – that also allows treating them the same in some places).
GC allocations need to be covered by some root pointer to keep them from being collected. The linked list also comes in handy there, because it allows getting by with a single statically allocated root pointer rather than one per allocation: It points to the first block, and each block points to the next. For this to work, the next pointers need to be part of the allocated blocks themselves so that the GC can find them, rather than some external structure like supervisor_allocation. This introduces the supervisor_allocation_node struct, of which the data block given to the client is only a part.
At this point, we may just as well move the length from supervisor_allocation to supervisor_allocation_node too, so that it only takes up memory when actually used. To avoid putting the supervisor_allocation_node definition and the ALLOCATION_NODE macro in the public header, client access to the length is mediated by a function, get_allocation_length().
This also allows storing the MOVABLE flag (which unlike the existing HOLE flag can also be set on allocations that are in use by clients) in the length field without confusing clients.
The only remaining purpose of supervisor_allocation is for supervisor_allocation* to act as a “handle” or “pointer-to-pointer” to enable clients to deal with moving memory. Keeping it a single-field struct rather than changing it to typedef uint32_t* supervisor_allocation improves type safety somewhat.
For clients that expect a malloc-like API and want to hold onto block pointers directly instead of through a “handle”, such as Protomatter in the RGBMatrix case, there needs to be a way of mapping from old to new pointer during a move notification. This is achieved by making allocation_from_ptr() expect the old pointer during that time, with the old pointers set aside higher up in the stack.

tannewt

One minor thing. Overall it looks like a huge improvement. Thank you! I restarted the CI hoping to see how well it fits.

I'd also like @jepler to take a look. He did a number of the displayio uses.

shared-module/sharpdisplay/SharpMemoryFramebuffer.c

supervisor/shared/memory.c

cwalther · 2020-11-17T22:11:33Z

The added commit should fix the mimxrt10xx and esp32s2 failures (and not break the rest). Will absorb this into the first commit on the next rebase (unless there are objections).

However, could someone who is familiar with that port take a look at ports/cxd56/supervisor/port.c? Its port_fixed_stack() has previously returned NULL, but from the way the port_{stack|heap}_get_{limit|bottom|top}() functions look, I would say that it does have a fixed stack.

Regarding the code size, it occurs to me that I should #ifdef out code that is only needed for movable=true when no features are enabled that use that. That should hopefully take care of the still failing three M0 boards. So far I have only been concentrating on code size for the full build.

tannewt · 2020-11-18T19:09:22Z

@kamtom480 would know about the stack on the cxd56.

kamtom480 · 2020-11-18T21:48:04Z

@cwalther You're right. cxd56 has a fixed stack so function port_fixed_stack() should return true. Could you change it?

cwalther · 2020-11-18T21:59:32Z

OK, I’ll change it here. Would you also like a PR to change it in the current code, independently of this one? I’m not sure if this will go into the next stable release.

kamtom480 · 2020-11-18T22:50:53Z

@cwalther Ok, I understand. In that case, I will test this change tomorrow and if everything works, I will create PR.

cwalther · 2020-11-18T23:07:40Z

In the current code (before my change) it’s a bit more involved than just changing a false to a true, but you should be able to copy the code from the mimxrt10xx and esp32s2 ports. I can do it if you want, but I won’t be able to test it.

kamtom480 · 2020-11-19T14:17:52Z

@cwalther Thank you for finding this. I created PR.

cwalther · 2020-11-22T14:18:47Z

I was mistaken about the failing M0 boards – some of them include terminalio, so I can’t just remove the moving code for them. Maybe I can simplify it in the case that there is only one movable allocation.

jepler

I did not fully follow the implementation of allocate_memory_node or supervisor_move_memory, but I feel I understand them at the high level anyway, and the high level is solid. If there are bugs we can shake them out.

shared-module/rgbmatrix/RGBMatrix.c

jepler · 2020-11-26T03:58:32Z

I resolved the merge conflicts locally, but three builds still don't fit without other changes. I may look at it further to see if I can find the necessary savings. @cwalther is there any part of this we can disable on flash-constrained builds, or is it all-or-nothing?

This allows calls to `allocate_memory()` while the VM is running, it will then allocate from the GC heap (unless there is a suitable hole among the supervisor allocations), and when the VM exits and the GC heap is freed, the allocation will be moved to the bottom of the former GC heap and transformed into a proper supervisor allocation. Existing movable allocations will also be moved to defragment the supervisor heap and ensure that the next VM run gets as much memory as possible for the GC heap. By itself this breaks terminalio because it violates the assumption that supervisor_display_move_memory() still has access to an undisturbed heap to copy the tilegrid from. It will work in many cases, but if you're unlucky you will get garbled terminal contents after exiting from the vm run that created the display. This will be fixed in the following commit, which is separate to simplify review.

Moving memory is now done by the infrastructure and neither necessary nor correct here anymore.

Hybrid allocation is now part of the infrastructure. Moving memory contents would not be necessary because displayio can recreate them, but does not hurt.

It not only caused crashes with requests larger than 64K (can happen with RGBMatrix), but also generated a lot longer code than necessary.

Avoids wasted memory and makes it easier to keep track of who needs how much for future additions.

jepler · 2020-11-28T17:39:58Z

After talking to @dhalbert on Discord, he plans to disable complex arithmetic to get space for deep sleep. This may leave enough space to let us incorporate this pull request too.

cwalther · 2020-11-28T20:55:48Z

Thanks for the notice, I haven’t been on Discord for a few days.

I’m working on the code size, but with no great gains yet. I have one optimization that recovers 64 bytes, but not enough for the metro_m0_express. Will push that shortly, also fixing the conflicts.

I did not fully follow the implementation of allocate_memory_node or supervisor_move_memory, but I feel I understand them at the high level anyway, and the high level is solid.

Thanks for the review. For that matter, I think it’s important that some of you maintainers understand this code to some extent, because I can’t promise that I will always be around to maintain it (though I’ll do my best).

is there any part of this we can disable on flash-constrained builds, or is it all-or-nothing?

On the boards that use no movable allocations (that does not include the metro_m0_express), supervisor_move_memory() can be left out entirely. I’m thinking of introducing another counter like CIRCUITPY_SUPERVISOR_ALLOC_COUNT for that. I haven’t found any good optimizations for one movable allocation yet. Other than that, it’s pretty much all-or-nothing.

When no features are enabled that use movable allocations, supervisor_move_memory() is not needed.

jepler · 2020-11-29T19:58:49Z

Building this branch merged together with #3767, BOARD=metro_m0_express TRANSLATION=de_DE has 744 bytes free.

jepler

This is good to go except that we need 3767 to go in first so we have the flash space.

The goto issue was discussed and OK'd.

dhalbert · 2020-11-30T04:09:27Z

PR #3773 breaks out the space savings from #3767, so if it's merged this build should be OK.

cwalther · 2020-11-30T22:40:56Z

The last commit eliminates the goto and shortens the code by 4 bytes.

But do take a critical look at it if you want to merge right away, I may be too tired for such things right now. 🙂

tannewt

Looks good to me! Thank you!

This was referenced Nov 15, 2020

Add supervisor.set_next_code_file() #3454

Merged

Add supervisor.get_previous_traceback() #3696

Closed

tannewt self-requested a review November 16, 2020 23:20

tannewt previously requested changes Nov 16, 2020

View reviewed changes

shared-module/sharpdisplay/SharpMemoryFramebuffer.c Show resolved Hide resolved

supervisor/shared/memory.c Outdated Show resolved Hide resolved

tannewt requested a review from jepler November 16, 2020 23:42

tannewt added enhancement refactoring supervisor labels Nov 16, 2020

tannewt added this to the 6.x.0 - Features milestone Nov 16, 2020

kamtom480 mentioned this pull request Nov 19, 2020

Return fixed stack on Spresense #3725

Merged

jepler previously approved these changes Nov 26, 2020

View reviewed changes

shared-module/rgbmatrix/RGBMatrix.c Show resolved Hide resolved

shared-module/rgbmatrix/RGBMatrix.c Show resolved Hide resolved

cwalther added 6 commits November 28, 2020 17:50

Use movable allocation system for terminal tilegrid.

2ba9805

Moving memory is now done by the infrastructure and neither necessary nor correct here anymore.

Use movable allocation system for Sharp display framebuffer.

ac91220

Hybrid allocation is now part of the infrastructure. Moving memory contents would not be necessary because displayio can recreate them, but does not hurt.

Use movable allocation system for RGBMatrix allocations.

a4b84cf

Hybrid allocation is now part of the infrastructure. Moving memory contents would not be necessary because displayio can recreate them, but does not hurt.

Fix align32_size().

7ca36d4

It not only caused crashes with requests larger than 64K (can happen with RGBMatrix), but also generated a lot longer code than necessary.

Make CIRCUITPY_SUPERVISOR_ALLOC_COUNT dependent on enabled features.

993a581

Avoids wasted memory and makes it easier to keep track of who needs how much for future additions.

cwalther added 2 commits November 29, 2020 16:04

Unify redundant low/high_address computation to save a bit of code size.

9ecaa16

Optimize out allocation moving code on boards that don't need it.

11ed6f8

When no features are enabled that use movable allocations, supervisor_move_memory() is not needed.

cwalther dismissed jepler’s stale review via 11ed6f8 November 29, 2020 17:09

cwalther force-pushed the movable branch from 22497eb to 11ed6f8 Compare November 29, 2020 17:09

jepler requested a review from tannewt November 29, 2020 19:59

jepler previously approved these changes Nov 29, 2020

View reviewed changes

dhalbert mentioned this pull request Nov 29, 2020

Disable complex arithmetic on SAMD21 builds to make space #3773

Merged

Eliminate goto.

d6f8a43

cwalther dismissed jepler’s stale review via d6f8a43 November 30, 2020 22:38

tannewt approved these changes Dec 1, 2020

View reviewed changes

tannewt merged commit a975ef4 into adafruit:main Dec 1, 2020

deshipu mentioned this pull request Apr 14, 2021

Add time capsule to store strings between supervisor reloads #4597

Closed

Add movable supervisor allocations #3695

Add movable supervisor allocations #3695

Uh oh!

Conversation

cwalther commented Nov 15, 2020

Uh oh!

tannewt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cwalther commented Nov 17, 2020

Uh oh!

tannewt commented Nov 18, 2020

Uh oh!

kamtom480 commented Nov 18, 2020

Uh oh!

cwalther commented Nov 18, 2020

Uh oh!

kamtom480 commented Nov 18, 2020

Uh oh!

cwalther commented Nov 18, 2020

Uh oh!

kamtom480 commented Nov 19, 2020

Uh oh!

cwalther commented Nov 22, 2020

Uh oh!

jepler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jepler commented Nov 26, 2020

Uh oh!

jepler commented Nov 28, 2020

Uh oh!

cwalther commented Nov 28, 2020

Uh oh!

jepler commented Nov 29, 2020

Uh oh!

jepler left a comment

Choose a reason for hiding this comment

Uh oh!

dhalbert commented Nov 30, 2020

Uh oh!

cwalther commented Nov 30, 2020

Uh oh!

tannewt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!