Skip to content

Introduce a long lived section of the heap. #547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 25, 2018

Conversation

tannewt
Copy link
Member

@tannewt tannewt commented Jan 24, 2018

This adapts the allocation process to start from either end of the heap
when searching for free space. The default behavior is identical to the
existing behavior where it starts with the lowest block and looks higher.
Now it can also look from the highest block and lower depending on the
long_lived parameter to gc_alloc. As the heap fills, the two sections may
overlap. When they overlap, a collect may be triggered in order to keep
the long lived section compact. However, free space is always eligable
for each type of allocation.

Heap prior would end up looking something like:
heap_layout

Afterwards its:
heap_layout1819

Video of it working here: https://www.youtube.com/watch?v=S0uEZqxOWOc

By starting from either of the end of the heap we have ability to separate
short lived objects from long lived ones. This separation reduces heap
fragmentation because long lived objects are easy to densely pack.

Most objects are short lived initially but may be made long lived when
they are referenced by a type or module. This involves copying the
memory and then letting the collect phase free the old portion.

QSTR pools and chunks are always long lived because they are never freed.

The reallocation, collection and free processes are largely unchanged. They
simply also maintain an index to the highest free block as well as the lowest.
These indices are used to speed up the allocation search until the next collect.

In practice, this change may slightly slow down import statements with the
benefit that memory is much less fragmented afterwards. For example, a test
import into a 20k heap that leaves ~6k free previously had the largest
continuous free space of ~400 bytes. After this change, the largest continuous
free space is over 3400 bytes.

…loc.

gc_alloc's API is changing and we shouldn't need to care about it.
So, we switch to m_malloc which has the default behavior we expect.
@tannewt tannewt added this to the 3.0 milestone Jan 24, 2018
@tannewt tannewt requested a review from dhalbert January 24, 2018 01:18
@tannewt
Copy link
Member Author

tannewt commented Jan 24, 2018

I'm still looking into fixing the tests so sit tight.

@dhalbert
Copy link
Collaborator

I would like to go first :)

This adapts the allocation process to start from either end of the heap
when searching for free space. The default behavior is identical to the
existing behavior where it starts with the lowest block and looks higher.
Now it can also look from the highest block and lower depending on the
long_lived parameter to gc_alloc. As the heap fills, the two sections may
overlap. When they overlap, a collect may be triggered in order to keep
the long lived section compact. However, free space is always eligable
for each type of allocation.

By starting from either of the end of the heap we have ability to separate
short lived objects from long lived ones. This separation reduces heap
fragmentation because long lived objects are easy to densely pack.

Most objects are short lived initially but may be made long lived when
they are referenced by a type or module. This involves copying the
memory and then letting the collect phase free the old portion.

QSTR pools and chunks are always long lived because they are never freed.

The reallocation, collection and free processes are largely unchanged. They
simply also maintain an index to the highest free block as well as the lowest.
These indices are used to speed up the allocation search until the next collect.

In practice, this change may slightly slow down import statements with the
benefit that memory is much less fragmented afterwards. For example, a test
import into a 20k heap that leaves ~6k free previously had the largest
continuous free space of ~400 bytes. After this change, the largest continuous
free space is over 3400 bytes.
It can now render the heap layout over a sequence of ram dumps.

The mpy analysis is also better at parsing mpy files.
@tannewt
Copy link
Member Author

tannewt commented Jan 24, 2018

Ok, this is ready for review.

py/gc.c Outdated
MP_STATE_MEM(gc_last_free_atb_index) = 0;
// Set last free ATB index to the end of the heap.
MP_STATE_MEM(gc_last_free_atb_index) = MP_STATE_MEM(gc_alloc_table_byte_len) - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 150 and 152 are both setting MP_STATE_MEM(gc_last_free_atb_index), so line 150 is wrong or redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

150 was wrong. Good catch!

mp_raw_code_t* raw_code = MP_OBJ_TO_PTR(fun_bc->const_table[i]);
if (raw_code->kind == MP_CODE_BYTECODE) {
raw_code->data.u_byte.bytecode = gc_make_long_lived((byte*) raw_code->data.u_byte.bytecode);
// TODO(tannewt): Do we actually want to recurse here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a question?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure about it but the comment isn't useful so I removed it.

fun_bc->const_table = gc_make_long_lived((mp_uint_t*) fun_bc->const_table);
// extra_args stores keyword only argument default values.
size_t words = gc_nbytes(fun_bc) / sizeof(mp_uint_t*);
for (size_t i = 0; i < words - 4; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the 4? Is that number of bytes? could it be 8 on 64-bit-word impls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the number of pointers stored in mp_obj_fun_bc_t before the extra_args array. Is there another way to get the array length?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct defn is:

typedef struct _mp_obj_fun_bc_t {
    mp_obj_base_t base;
    mp_obj_dict_t *globals;         // the context within which this function was defined
    const byte *bytecode;           // bytecode for the function
    const mp_uint_t *const_table;   // constant table
    // the following extra_args array is allocated space to take (in order):
    //  - values of positional default args (if any)
    //  - a single slot for default kw args dict (if it has them)
    //  - a single slot for var args tuple (if it takes them)
    //  - a single slot for kw args dict (if it takes them)
    mp_obj_t extra_args[];
} mp_obj_fun_bc_t;

I'm not sure why it doesn't say [4]. then I think you could use sizeof(). And if it's a VLA (variable length array), you can use sizeof() also. Found this: https://stackoverflow.com/questions/14995870/behavior-of-sizeof-on-variable-length-arrays-c-only.

But I'm not sure this is worth fixing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's allocated separately and assigned to there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its done through a cast so I'm not sure if sizeof would work: https://github.com/adafruit/circuitpython/blob/master/py/objfun.c#L356

Copy link
Collaborator

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants