-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Memory stays around after pickle cycle #43156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Another datapoint: running your script on OSX I'm seeing a lot more being released at the end
Also if I add a |
Same with
|
Going though |
FYI you should probably run this with |
I believe that I have run this with MALLOC_TRIM_THRESHOLD_=0 already and
saw the same results, but I should verify
…On Tue, Aug 24, 2021 at 5:57 PM Gabe Joseph ***@***.***> wrote:
FYI you should probably run this with MALLOC_TRIM_THRESHOLD_=0 python -m
memory_profiler memory_issue.py on linux / DYLD_INSERT_LIBRARIES=$(brew
--prefix jemalloc)/lib/libjemalloc.dylib python -m memory_profiler
memory_issue.py on macOS to encourage the allocator to release pages back
to the OS. memory_profiler is just tracking the RSS of the process,
nothing fancier, so it's possible the memory has been free'd as far as
pandas can get it, just not fully released.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#43156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDPNMSWMLWF5M42T2TT6QPVDANCNFSM5CR6GRMA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Yes, same result on my linux/ubuntu machine running mambafoge. |
Is there a viable non-pickle alternative? When I change |
checkout this response from stackoverflow. |
I tried different libraries from pickle. I used dill and joblib. pickle library resultsimport pandas as pd
import numpy as np
import pickle
import dill
from io import BytesIO
import joblib
from memory_profiler import profile
@profile
def test1():
df = pd.DataFrame(np.random.random((20000, 1000)))
df["partitions"] = (df[0] * 10000).astype(int)
_, groups = zip(*df.groupby("partitions"))
del df
groups = [pickle.dumps(group) for group in groups]
groups = [pickle.loads(group) for group in groups]
del groups
also tried different approachgroups = pickle.dumps(groups)
groups = pickle.loads(groups)
joblib library results@profile
def test3():
df = pd.DataFrame(np.random.random((20000, 1000)))
df["partitions"] = (df[0] * 10000).astype(int)
_, groups = zip(*df.groupby("partitions"))
del df
bytes_container = BytesIO()
groups = joblib.dump(groups, bytes_container)
bytes_container.seek(0)
groups = bytes_container.read()
del groups
dill library results@profile
def test2():
df = pd.DataFrame(np.random.random((20000, 1000)))
df["partitions"] = (df[0] * 10000).astype(int)
_, groups = zip(*df.groupby("partitions"))
del df
groups = dill.dumps(groups)
groups = dill.loads(groups)
del groups
I think this not about a library we are using. Python uses extra memory for processing. |
Hi Folks,
Related to #43155 I'm running into memory issues when pickling many small pandas dataframes. The following script creates a pandas dataframe, splits it up, and then pickles each little split and brings them back again. It then deletes all objects from memory but something is still sticking around. Here is the script, followed by the outputs of using
memory_profiler
As you can see, we start with 70 MiB in memory and end with 550 MiB, despite all relevant objects being released. This leak increases with the number of groups (scale the 10000 number to move the leak up or down). Any help or pointers on how to track this down would be welcome.
The text was updated successfully, but these errors were encountered: