-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Introduce a long lived section of the heap. #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…loc. gc_alloc's API is changing and we shouldn't need to care about it. So, we switch to m_malloc which has the default behavior we expect.
string instead of the heap.
I'm still looking into fixing the tests so sit tight. |
I would like to go first :) |
e0795f6
to
51e361c
Compare
This adapts the allocation process to start from either end of the heap when searching for free space. The default behavior is identical to the existing behavior where it starts with the lowest block and looks higher. Now it can also look from the highest block and lower depending on the long_lived parameter to gc_alloc. As the heap fills, the two sections may overlap. When they overlap, a collect may be triggered in order to keep the long lived section compact. However, free space is always eligable for each type of allocation. By starting from either of the end of the heap we have ability to separate short lived objects from long lived ones. This separation reduces heap fragmentation because long lived objects are easy to densely pack. Most objects are short lived initially but may be made long lived when they are referenced by a type or module. This involves copying the memory and then letting the collect phase free the old portion. QSTR pools and chunks are always long lived because they are never freed. The reallocation, collection and free processes are largely unchanged. They simply also maintain an index to the highest free block as well as the lowest. These indices are used to speed up the allocation search until the next collect. In practice, this change may slightly slow down import statements with the benefit that memory is much less fragmented afterwards. For example, a test import into a 20k heap that leaves ~6k free previously had the largest continuous free space of ~400 bytes. After this change, the largest continuous free space is over 3400 bytes.
It can now render the heap layout over a sequence of ram dumps. The mpy analysis is also better at parsing mpy files.
51e361c
to
da330f0
Compare
Ok, this is ready for review. |
py/gc.c
Outdated
MP_STATE_MEM(gc_last_free_atb_index) = 0; | ||
// Set last free ATB index to the end of the heap. | ||
MP_STATE_MEM(gc_last_free_atb_index) = MP_STATE_MEM(gc_alloc_table_byte_len) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 150 and 152 are both setting MP_STATE_MEM(gc_last_free_atb_index)
, so line 150 is wrong or redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
150 was wrong. Good catch!
py/gc_long_lived.c
Outdated
mp_raw_code_t* raw_code = MP_OBJ_TO_PTR(fun_bc->const_table[i]); | ||
if (raw_code->kind == MP_CODE_BYTECODE) { | ||
raw_code->data.u_byte.bytecode = gc_make_long_lived((byte*) raw_code->data.u_byte.bytecode); | ||
// TODO(tannewt): Do we actually want to recurse here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still unsure about it but the comment isn't useful so I removed it.
py/gc_long_lived.c
Outdated
fun_bc->const_table = gc_make_long_lived((mp_uint_t*) fun_bc->const_table); | ||
// extra_args stores keyword only argument default values. | ||
size_t words = gc_nbytes(fun_bc) / sizeof(mp_uint_t*); | ||
for (size_t i = 0; i < words - 4; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the 4
? Is that number of bytes? could it be 8 on 64-bit-word impls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the number of pointers stored in mp_obj_fun_bc_t before the extra_args array. Is there another way to get the array length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct defn is:
typedef struct _mp_obj_fun_bc_t {
mp_obj_base_t base;
mp_obj_dict_t *globals; // the context within which this function was defined
const byte *bytecode; // bytecode for the function
const mp_uint_t *const_table; // constant table
// the following extra_args array is allocated space to take (in order):
// - values of positional default args (if any)
// - a single slot for default kw args dict (if it has them)
// - a single slot for var args tuple (if it takes them)
// - a single slot for kw args dict (if it takes them)
mp_obj_t extra_args[];
} mp_obj_fun_bc_t;
I'm not sure why it doesn't say [4]
. then I think you could use sizeof()
. And if it's a VLA (variable length array), you can use sizeof()
also. Found this: https://stackoverflow.com/questions/14995870/behavior-of-sizeof-on-variable-length-arrays-c-only.
But I'm not sure this is worth fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's allocated separately and assigned to there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its done through a cast so I'm not sure if sizeof would work: https://github.com/adafruit/circuitpython/blob/master/py/objfun.c#L356
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
This adapts the allocation process to start from either end of the heap
when searching for free space. The default behavior is identical to the
existing behavior where it starts with the lowest block and looks higher.
Now it can also look from the highest block and lower depending on the
long_lived parameter to gc_alloc. As the heap fills, the two sections may
overlap. When they overlap, a collect may be triggered in order to keep
the long lived section compact. However, free space is always eligable
for each type of allocation.
Heap prior would end up looking something like:

Afterwards its:

Video of it working here: https://www.youtube.com/watch?v=S0uEZqxOWOc
By starting from either of the end of the heap we have ability to separate
short lived objects from long lived ones. This separation reduces heap
fragmentation because long lived objects are easy to densely pack.
Most objects are short lived initially but may be made long lived when
they are referenced by a type or module. This involves copying the
memory and then letting the collect phase free the old portion.
QSTR pools and chunks are always long lived because they are never freed.
The reallocation, collection and free processes are largely unchanged. They
simply also maintain an index to the highest free block as well as the lowest.
These indices are used to speed up the allocation search until the next collect.
In practice, this change may slightly slow down import statements with the
benefit that memory is much less fragmented afterwards. For example, a test
import into a 20k heap that leaves ~6k free previously had the largest
continuous free space of ~400 bytes. After this change, the largest continuous
free space is over 3400 bytes.