Skip to content

Fix unbounded memory usage in pickle environment #912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 4, 2023

Conversation

vpoughonEZ
Copy link
Contributor

@vpoughonEZ vpoughonEZ commented Apr 19, 2023

In my project we are using incremental builds and building often in parallel with sphinx-build -j auto. We use sphinx-needs a lot and we've noticed in this configuration huge memory usage during compilation, sometimes so high (30+ GB) that it crashes my laptop. After some debugging, it turns out it comes from the size of the pickled environment that's used to store information between incremental builds.

This file is typically found in your build directory as environment.pickle. The problem is that it grows after every incremental parallel build. This can be seen by rebuilding our project with something like:

grep -Rl "\.\. need" | xargs touch
sphinx-build -j auto [...]

When doing this, we notice that the file size of environment.pickle increases every time. Of course when a build is run, this file is unpickled in RAM by each sphinx-build process. Looking more closely at the cause, it comes from env.needs_all_docs["all"]. This list is correctly appended to only if the element is not already in it in add_doc() :

if docname not in env.needs_all_docs["all"]:
env.needs_all_docs["all"].append(docname)

However, merge_data() appends both lists without checking for duplicates. This can result in unbounded increase in the size of the list in the build environment:

if other_key in objects:
objects[other_key] += other_value

This can also be seen by running this simple script in your build directory:

import pickle

with open("path/to/your/environment.pickle", "rb") as f:
    env = pickle.load(f)

print("{} = {}".format('len(env.needs_all_docs["all"])', len(env.needs_all_docs["all"])))

This PR fixes this issue by merging lists without duplicates in merge_data().

@vpoughonEZ vpoughonEZ force-pushed the fix-merge-data-all-docs branch from 9a34603 to c8ace59 Compare April 19, 2023 15:08
@danwos
Copy link
Member

danwos commented May 2, 2023

Thanks for the PR and the great explanation 👍

Will merge it as soon as the CI is green.

@danwos
Copy link
Member

danwos commented May 2, 2023

Ok, the CI probs are not your fault.
A new flake8 version introduced some new checks.
Will solve them soon.

@danwos danwos force-pushed the fix-merge-data-all-docs branch from c8ace59 to f6a44e4 Compare May 4, 2023 16:22
@danwos danwos merged commit c27057e into useblocks:master May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants