Skip to content

Add movable supervisor allocations #3695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 1, 2020
Merged

Add movable supervisor allocations #3695

merged 9 commits into from
Dec 1, 2020

Conversation

cwalther
Copy link

This allows calls to allocate_memory() while the VM is running, it will then allocate from the GC heap (unless there is a suitable hole among the supervisor allocations), and when the VM exits and the GC heap is freed, the allocation will be moved to the bottom of the former GC heap and transformed into a proper supervisor allocation. Existing movable allocations will also be moved to defragment the supervisor heap and ensure that the next VM run gets as much memory as possible for the GC heap.

Allocations embedded in the GC heap are tracked using the same supervisor_allocation struct as proper supervisor allocations, transparently to clients. Clients can deal with moving memory either by holding onto their supervisor_allocation (which stays valid across the move) and getting its ptr (which changes) every time they need to dereference it, or if they held onto the ptr directly, get notified using a callback when it changes. The only time moves happen is at the end of a VM run.

This unifies a couple of existing ad-hoc mechanisms doing similar things with the terminal tilegrid, Sharp display framebuffer, and RGBMatrix, and provides the infrastructure for future applications of saving information from one VM run into the next, such as which file to run next (#1084 + #3454), exception traceback from the previous run (#1084, PR forthcoming), and USB descriptors (#1015).

Rationale for the internal design:

  • When also tracking embedded GC allocations, the order in the array of supervisor_allocations can no longer match the order in memory, as GC allocations can come in any order. The order, still needed by the regular supervisor heap administration, therefore needs to be tracked in some other way. Let’s use a linked list (or two, one for the low and one for the high side – that also allows treating them the same in some places).
  • GC allocations need to be covered by some root pointer to keep them from being collected. The linked list also comes in handy there, because it allows getting by with a single statically allocated root pointer rather than one per allocation: It points to the first block, and each block points to the next. For this to work, the next pointers need to be part of the allocated blocks themselves so that the GC can find them, rather than some external structure like supervisor_allocation. This introduces the supervisor_allocation_node struct, of which the data block given to the client is only a part.
  • At this point, we may just as well move the length from supervisor_allocation to supervisor_allocation_node too, so that it only takes up memory when actually used. To avoid putting the supervisor_allocation_node definition and the ALLOCATION_NODE macro in the public header, client access to the length is mediated by a function, get_allocation_length().
  • This also allows storing the MOVABLE flag (which unlike the existing HOLE flag can also be set on allocations that are in use by clients) in the length field without confusing clients.
  • The only remaining purpose of supervisor_allocation is for supervisor_allocation* to act as a “handle” or “pointer-to-pointer” to enable clients to deal with moving memory. Keeping it a single-field struct rather than changing it to typedef uint32_t* supervisor_allocation improves type safety somewhat.
  • For clients that expect a malloc-like API and want to hold onto block pointers directly instead of through a “handle”, such as Protomatter in the RGBMatrix case, there needs to be a way of mapping from old to new pointer during a move notification. This is achieved by making allocation_from_ptr() expect the old pointer during that time, with the old pointers set aside higher up in the stack.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing. Overall it looks like a huge improvement. Thank you! I restarted the CI hoping to see how well it fits.

I'd also like @jepler to take a look. He did a number of the displayio uses.

@cwalther
Copy link
Author

The added commit should fix the mimxrt10xx and esp32s2 failures (and not break the rest). Will absorb this into the first commit on the next rebase (unless there are objections).

However, could someone who is familiar with that port take a look at ports/cxd56/supervisor/port.c? Its port_fixed_stack() has previously returned NULL, but from the way the port_{stack|heap}_get_{limit|bottom|top}() functions look, I would say that it does have a fixed stack.

Regarding the code size, it occurs to me that I should #ifdef out code that is only needed for movable=true when no features are enabled that use that. That should hopefully take care of the still failing three M0 boards. So far I have only been concentrating on code size for the full build.

@tannewt
Copy link
Member

tannewt commented Nov 18, 2020

@kamtom480 would know about the stack on the cxd56.

@kamtom480
Copy link

@cwalther You're right. cxd56 has a fixed stack so function port_fixed_stack() should return true. Could you change it?

@cwalther
Copy link
Author

OK, I’ll change it here. Would you also like a PR to change it in the current code, independently of this one? I’m not sure if this will go into the next stable release.

@kamtom480
Copy link

@cwalther Ok, I understand. In that case, I will test this change tomorrow and if everything works, I will create PR.

@cwalther
Copy link
Author

In the current code (before my change) it’s a bit more involved than just changing a false to a true, but you should be able to copy the code from the mimxrt10xx and esp32s2 ports. I can do it if you want, but I won’t be able to test it.

@kamtom480
Copy link

@cwalther Thank you for finding this. I created PR.

@cwalther
Copy link
Author

I was mistaken about the failing M0 boards – some of them include terminalio, so I can’t just remove the moving code for them. Maybe I can simplify it in the case that there is only one movable allocation.

jepler
jepler previously approved these changes Nov 26, 2020
Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not fully follow the implementation of allocate_memory_node or supervisor_move_memory, but I feel I understand them at the high level anyway, and the high level is solid. If there are bugs we can shake them out.

@jepler
Copy link

jepler commented Nov 26, 2020

I resolved the merge conflicts locally, but three builds still don't fit without other changes. I may look at it further to see if I can find the necessary savings. @cwalther is there any part of this we can disable on flash-constrained builds, or is it all-or-nothing?

This allows calls to `allocate_memory()` while the VM is running, it will then allocate from the GC heap (unless there is a suitable hole among the supervisor allocations), and when the VM exits and the GC heap is freed, the allocation will be moved to the bottom of the former GC heap and transformed into a proper supervisor allocation. Existing movable allocations will also be moved to defragment the supervisor heap and ensure that the next VM run gets as much memory as possible for the GC heap.

By itself this breaks terminalio because it violates the assumption that supervisor_display_move_memory() still has access to an undisturbed heap to copy the tilegrid from. It will work in many cases, but if you're unlucky you will get garbled terminal contents after exiting from the vm run that created the display. This will be fixed in the following commit, which is separate to simplify review.
Moving memory is now done by the infrastructure and neither necessary nor correct here anymore.
Hybrid allocation is now part of the infrastructure. Moving memory contents would not be necessary because displayio can recreate them, but does not hurt.
Hybrid allocation is now part of the infrastructure. Moving memory contents would not be necessary because displayio can recreate them, but does not hurt.
It not only caused crashes with requests larger than 64K (can happen with RGBMatrix), but also generated a lot longer code than necessary.
Avoids wasted memory and makes it easier to keep track of who needs how much for future additions.
@jepler
Copy link

jepler commented Nov 28, 2020

After talking to @dhalbert on Discord, he plans to disable complex arithmetic to get space for deep sleep. This may leave enough space to let us incorporate this pull request too.

@cwalther
Copy link
Author

Thanks for the notice, I haven’t been on Discord for a few days.

I’m working on the code size, but with no great gains yet. I have one optimization that recovers 64 bytes, but not enough for the metro_m0_express. Will push that shortly, also fixing the conflicts.

I did not fully follow the implementation of allocate_memory_node or supervisor_move_memory, but I feel I understand them at the high level anyway, and the high level is solid.

Thanks for the review. For that matter, I think it’s important that some of you maintainers understand this code to some extent, because I can’t promise that I will always be around to maintain it (though I’ll do my best).

is there any part of this we can disable on flash-constrained builds, or is it all-or-nothing?

On the boards that use no movable allocations (that does not include the metro_m0_express), supervisor_move_memory() can be left out entirely. I’m thinking of introducing another counter like CIRCUITPY_SUPERVISOR_ALLOC_COUNT for that. I haven’t found any good optimizations for one movable allocation yet. Other than that, it’s pretty much all-or-nothing.

When no features are enabled that use movable allocations, supervisor_move_memory() is not needed.
@jepler
Copy link

jepler commented Nov 29, 2020

Building this branch merged together with #3767, BOARD=metro_m0_express TRANSLATION=de_DE has 744 bytes free.

@jepler jepler requested a review from tannewt November 29, 2020 19:59
jepler
jepler previously approved these changes Nov 29, 2020
Copy link

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go except that we need 3767 to go in first so we have the flash space.

@jepler jepler dismissed tannewt’s stale review November 29, 2020 20:00

The goto issue was discussed and OK'd.

@dhalbert
Copy link
Collaborator

PR #3773 breaks out the space savings from #3767, so if it's merged this build should be OK.

@cwalther
Copy link
Author

The last commit eliminates the goto and shortens the code by 4 bytes.

But do take a critical look at it if you want to merge right away, I may be too tired for such things right now. 🙂

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants