Skip to content

better gc #3679

Open
Open
@whyrusleeping

Description

@whyrusleeping

I have a large enough ipfs repo (over 300 million blocks) that it takes a very large amount of memory to perform a gc. computing the marked set is expensive.

I'm thinking that using something like a bloom filter could make this process use up much less memory, at the expense of not cleaning out every block. The difficulty here is that false positives while enumerating the set of pinned objects could result in drastically lowered performance (we could accidentally think a block that points to everything is pinned and end up cleaning out nothing) so selecting parameters to try and avoid this is important.

Another (potentially more complicated but more accurate) option is to use a disk backed prefix tree to store the enumerated pinset (with heavy caching up to some memory limit to make perf tolerable). This just offloads the memory cost of storing the sets to disk, which is generally acceptable, but can be an issue as it would prevent people from running a GC if their disk was full, this is generally considered to be a bad thing.

I'm also interested in strategies that can do "a little gc". Something that allows us to quickly free a smaller subset of the blocks, without the overhead of performing an entire gc scan. Implementing something that lets us do this may require a rethinking of how pinsets and objects are stored.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/enhancementA net-new feature or improvement to an existing featureneed/community-inputNeeds input from the wider community

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions