-
Notifications
You must be signed in to change notification settings - Fork 643
Add jemalloc, lock Rust version #1623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Short version: We have a memory issue after switching to 1.32 which came from the allocator switch. 1.31 used jemalloc 4. jemallocator uses jemalloc 5, which behaves differently. With jemalloc 5 as configured in the PR, we still have an issue which needs to be addressed eventually, but the behavior is improved to the point where I'm not concerned about it causing a service disruption. My current theory points to the problem coming from civet, which we're removing soon anyway. Looking into this further will require a lot of time and effort, so I'd like to see if we have a major improvement from removing civet before going further. I believe that the behavior that we're seeing with this config is the same as a rare behavior we saw on Rust 1.31, so I do not think this is any riskier than locking back down to 1.31. This is also what has been in production for the last several days as we've been debugging. The `let _` when enabling background threads is that they're not supported on all platforms (notably mac). The function is safe to call on unsupported platforms. Folks who are more familiar with jemalloc than I am recommended this over `#[cfg(not(macos))]`. This PR also locks down the Rust version so we don't unexpectedly have issues from a version upgrade mixed with code changes again, or an unexpected full build with no cache during an incident. I expect that we will just bump this immediately when a new stable version comes out, but we should make sure we always deploy right after bumping it. Everything after this is more detail on what I've said above, and/or documenting what I've found so it doesn't get lost. Long version: After upgrading to Rust 1.32, we saw what appeared to be a large memory leak. I've been spending a lot of time over the last week attempting to narrow this down, as well as come up with a permanent fix. The fact that the 1.32 upgrade caused this makes sense, as it changed the default allocator from jemalloc 4 to the system allocator. What doesn't make sense is why the system allocator would cause us to have a major memory leak, but jemalloc 4 causes us to have completely stable usage, and jemalloc 5 does something completely different (more on that later) From looking into this we've learned a few interesting things: - We can run out of memory and it's completely fine. - If we run out of memory, we will start swapping to disk which in theory should cause major perf issues, but in practice it barely caused a noticable effect on response times - We are more affected by our choice of allocator than we thought. - Our memory usage is in fact coming from our server, not nginx. - I've long wondered why exactly we were using so much memory, but haven't had time to look into it in detail until now. I've always assumed it was nginx's buffers, but this makes it clear that most or all of our memory usage is in fact coming from the server. - Whatever is causing this issue seems to require production levels of traffic, as we haven't been able to reproduce locally or in staging. - The underlying problem almost certainly comes from one of our C dependencies which I'll elaborate on below - We are reading and writing pages to disk (aka swapping) even when we're nowhere near max memory usage. My understanding is this should not be happening. - Our allocations don't appear to correlate to crate uploads, which is the only time I'd expect us to be allocating a ton of memory. The spikes *sometimes* come from uploads, but not always. And the amount that gets allocated is way larger than the size of the crate being uploaded (0.18MB crate correlated with a 113MB increase in RSS) Here's the behavior we've seen with various allocators: - 1.31 default allocator (jemalloc 4 unprefixed) - Memory usage very rapidly grows to *some* number (usually between 200MB and 350MB) and remains extremely stable at that number until the server restarts. The number it grows to is different every restart, but whatever number it reaches is the same between both servers. - On rare occasions, there is a spike in memory usage around 1-2 hours before restart. I've suspected in the past this is related to the connection pool exhaustion issue, since IIRC the memory usage spiked the last time we had that issue occur in production. The spike only affects a single server when it happens. - 1.32 default allocator (system allocator) - Memory usage grows rapidly and unbounded, consuming all available RAM within 3 hours and continuing to grow into swap beyond that. I do not know if it eventually reaches an upper bound as I've never let it run for long after it begins swapping - jemallocator (jemalloc 5, prefixed, no background threads) - Similar to system allocator, but much slower growth - jemallocator unprefixed (jemalloc 5, unprefixed, no background threads) - Memory usage grows to ~150MB, and is either stable or very slowly grows towards ~200MB. Memory usage is similar between servers, but not exactly the same. "stable" in this context means not obviously leaking or growing in unreasonable ways, but it moves around much more than jemalloc 4 did (25MB or so vs <5MB) - It's almost always only one server that grows, but I've seen both grow at least once - After 16 hours, memory usage grows extremely rapidly on one or occasionally both servers to some number (usually between 400MB and 520MB) and stays there until server restart. - The number it grows to appears to be roughly 2.25 the usage before the growth. And by roughly I mean within 1-2% of that number, and it extremely consistently exhibits this behavior. - Because unprefixing jemalloc's symbols caused the change here, it's pretty clear that it's a leak in memory allocated by one of our C dependencies. We have quite a few of these, but the ones that would make the most sense to be the source are either civet, miniz (coming from flate2), libgit2, or libpq (coming from Diesel) - Diesel has been run through valgrind in the past, and I did a brief code audit as well, so I don't think it's libpq - It wouldn't make a lot of sense for libgit2 to be the cause, but once we've moved index updates off the web server there's very little reason we can't just subshell out to git - civet makes the most sense to be the cause, and is something we're intending to remove - flate2 would also make sense and is much harder to remove - The 16 hour time period before the spike was very consistent, and not correlated with a time of day, so I don't think it was triggered by specific traffic patterns - Once the spike happens memory usage tends to remain pretty stable there, with one notable exception of the first day this was deployed where it grew to consume all available memory, then randomly decided to allocate 116MB of swap, and shortly afterwards it freed that amount of RSS. At the time I thought the fact that it stopped at almost exactly 512MB of RAM was too specific to be a coincidence but I now believe that it was a coincidence - jemallocator unprefixed, `jemalloc_ctl` (jemalloc 5, unprefixed, background threads enabled) - This is the config in this PR - Similar behavior to background threads turned off, but it takes 22-23 hours for the spike to occur - The behavior of the jemallocator crate is the same on both 1.32 and 1.31 My current working theory is that we do actually have a memory leak, but what is being leaked is something extremely small. The reason this causes such a huge difference in behavior on the system allocator is that these small leaked allocations are causing fragmentation issues. For this to be the case, it would require that we are slowly requesting larger and larger allocations over time, which is what leads me to believe that civet is the cause. I suspect the growth is from civet growing its request/response buffers (which should be the largest single allocations we have), but the leak is something very small that is just causing the allocator to be unable to re-use previously freed blocks. It's also possible that these re-allocations are just causing this much fragmentation naturally without the leak, but I have a hard time believing the system allocator is actually performing *that* much worse here. The slow consistent memory growth before the huge spike also implies the small leak to me. This would also explain some of the behavior of jemalloc, and why we're seeing that spike. jemalloc puts smaller allocations into the same arena, so would prevent the fragmentation caused by leaking smaller allocations, and potentially would be able to hide it completely for a while. The fact that the spike is consistently 2.25x the previous max RSS is too specific not to be the growth factor on something. I think it is jemalloc growing one of its larger arenas, but it could also be the size of some internal buffer used by civet or something else. I believe that the behavior we're seeing with jemalloc 5 isn't actually different overall from what we saw with jemalloc 4, other than the spike happening sooner. I think the spike we very rarely saw in jemalloc 4 would consistently have occurred if the server didn't restart every 24 hours. I suspect switching from civet to hyper will remove this issue entirely. If it doesn't, I intend to dig in further. The next steps would be to log more from jemalloc on a regular interval, get some stack traces from jemalloc for large allocations, and possibly even running valgrind in production for brief periods. Most of these things will require a lot of work and have a very low signal to noise ratio, which is why I've chosen to stop here at a point where I'm confident we don't have an impending outage, and rule a few more things out before digging deeper. Other things that I have done which are not in this PR or reflected in the code base are: - Enabling more verbose logging on Heroku's side - This gives us more granularity than the metrics graph gives us (every 20 seconds, and also separated by server) - Additional stats beyond the metrics graphs (disk cache, pages written to disk) - Reached out to support to see if we can disable the daily restart - I don't intend to do this right now if it's possible, but it's a good option to have to aid us in debugging this and the connection pooling issue. - It's not something they offer out of the box, but the platform team is looking into making it possible for us
Is there anything remaining preventing us from upgrading to 1.32 then, or is it just that you'd like to deploy that separately? |
Nope, nothing prevents us from upgrading to 1.32 which is why I included it in here (1.31 is in prod right now but that was just to verify that behavior was identical between 1.31 and 1.32 with this config) |
Ah I see, I was comparing this to what was on production and I was looking at production when I saw the Rust version at 1.31. My mistake! Given that this is what's on production, this should get merged imo, so: @bors r+ |
📌 Commit 648e431 has been approved by |
Add jemalloc, lock Rust version Short version: We have a memory issue after switching to 1.32 which came from the allocator switch. 1.31 used jemalloc 4. jemallocator uses jemalloc 5, which behaves differently. With jemalloc 5 as configured in the PR, we still have an issue which needs to be addressed eventually, but the behavior is improved to the point where I'm not concerned about it causing a service disruption. My current theory points to the problem coming from civet, which we're removing soon anyway. Looking into this further will require a lot of time and effort, so I'd like to see if we have a major improvement from removing civet before going further. I believe that the behavior that we're seeing with this config is the same as a rare behavior we saw on Rust 1.31, so I do not think this is any riskier than locking back down to 1.31. This is also what has been in production for the last several days as we've been debugging. The `let _` when enabling background threads is that they're not supported on all platforms (notably mac). The function is safe to call on unsupported platforms. Folks who are more familiar with jemalloc than I am recommended this over `#[cfg(not(macos))]`. This PR also locks down the Rust version so we don't unexpectedly have issues from a version upgrade mixed with code changes again, or an unexpected full build with no cache during an incident. I expect that we will just bump this immediately when a new stable version comes out, but we should make sure we always deploy right after bumping it. Everything after this is more detail on what I've said above, and/or documenting what I've found so it doesn't get lost. Long version: After upgrading to Rust 1.32, we saw what appeared to be a large memory leak. I've been spending a lot of time over the last week attempting to narrow this down, as well as come up with a permanent fix. The fact that the 1.32 upgrade caused this makes sense, as it changed the default allocator from jemalloc 4 to the system allocator. What doesn't make sense is why the system allocator would cause us to have a major memory leak, but jemalloc 4 causes us to have completely stable usage, and jemalloc 5 does something completely different (more on that later) From looking into this we've learned a few interesting things: - We can run out of memory and it's completely fine. - If we run out of memory, we will start swapping to disk which in theory should cause major perf issues, but in practice it barely caused a noticable effect on response times - We are more affected by our choice of allocator than we thought. - Our memory usage is in fact coming from our server, not nginx. - I've long wondered why exactly we were using so much memory, but haven't had time to look into it in detail until now. I've always assumed it was nginx's buffers, but this makes it clear that most or all of our memory usage is in fact coming from the server. - Whatever is causing this issue seems to require production levels of traffic, as we haven't been able to reproduce locally or in staging. - The underlying problem almost certainly comes from one of our C dependencies which I'll elaborate on below - We are reading and writing pages to disk (aka swapping) even when we're nowhere near max memory usage. My understanding is this should not be happening. - Our allocations don't appear to correlate to crate uploads, which is the only time I'd expect us to be allocating a ton of memory. The spikes *sometimes* come from uploads, but not always. And the amount that gets allocated is way larger than the size of the crate being uploaded (0.18MB crate correlated with a 113MB increase in RSS) Here's the behavior we've seen with various allocators: - 1.31 default allocator (jemalloc 4 unprefixed) - Memory usage very rapidly grows to *some* number (usually between 200MB and 350MB) and remains extremely stable at that number until the server restarts. The number it grows to is different every restart, but whatever number it reaches is the same between both servers. - On rare occasions, there is a spike in memory usage around 1-2 hours before restart. I've suspected in the past this is related to the connection pool exhaustion issue, since IIRC the memory usage spiked the last time we had that issue occur in production. The spike only affects a single server when it happens. - 1.32 default allocator (system allocator) - Memory usage grows rapidly and unbounded, consuming all available RAM within 3 hours and continuing to grow into swap beyond that. I do not know if it eventually reaches an upper bound as I've never let it run for long after it begins swapping - jemallocator (jemalloc 5, prefixed, no background threads) - Similar to system allocator, but much slower growth - jemallocator unprefixed (jemalloc 5, unprefixed, no background threads) - Memory usage grows to ~150MB, and is either stable or very slowly grows towards ~200MB. Memory usage is similar between servers, but not exactly the same. "stable" in this context means not obviously leaking or growing in unreasonable ways, but it moves around much more than jemalloc 4 did (25MB or so vs <5MB) - It's almost always only one server that grows, but I've seen both grow at least once - After 16 hours, memory usage grows extremely rapidly on one or occasionally both servers to some number (usually between 400MB and 520MB) and stays there until server restart. - The number it grows to appears to be roughly 2.25 the usage before the growth. And by roughly I mean within 1-2% of that number, and it extremely consistently exhibits this behavior. - Because unprefixing jemalloc's symbols caused the change here, it's pretty clear that it's a leak in memory allocated by one of our C dependencies. We have quite a few of these, but the ones that would make the most sense to be the source are either civet, miniz (coming from flate2), libgit2, or libpq (coming from Diesel) - Diesel has been run through valgrind in the past, and I did a brief code audit as well, so I don't think it's libpq - It wouldn't make a lot of sense for libgit2 to be the cause, but once we've moved index updates off the web server there's very little reason we can't just subshell out to git - civet makes the most sense to be the cause, and is something we're intending to remove - flate2 would also make sense and is much harder to remove - The 16 hour time period before the spike was very consistent, and not correlated with a time of day, so I don't think it was triggered by specific traffic patterns - Once the spike happens memory usage tends to remain pretty stable there, with one notable exception of the first day this was deployed where it grew to consume all available memory, then randomly decided to allocate 116MB of swap, and shortly afterwards it freed that amount of RSS. At the time I thought the fact that it stopped at almost exactly 512MB of RAM was too specific to be a coincidence but I now believe that it was a coincidence - jemallocator unprefixed, `jemalloc_ctl` (jemalloc 5, unprefixed, background threads enabled) - This is the config in this PR - Similar behavior to background threads turned off, but it takes 22-23 hours for the spike to occur - The behavior of the jemallocator crate is the same on both 1.32 and 1.31 My current working theory is that we do actually have a memory leak, but what is being leaked is something extremely small. The reason this causes such a huge difference in behavior on the system allocator is that these small leaked allocations are causing fragmentation issues. For this to be the case, it would require that we are slowly requesting larger and larger allocations over time, which is what leads me to believe that civet is the cause. I suspect the growth is from civet growing its request/response buffers (which should be the largest single allocations we have), but the leak is something very small that is just causing the allocator to be unable to re-use previously freed blocks. It's also possible that these re-allocations are just causing this much fragmentation naturally without the leak, but I have a hard time believing the system allocator is actually performing *that* much worse here. The slow consistent memory growth before the huge spike also implies the small leak to me. This would also explain some of the behavior of jemalloc, and why we're seeing that spike. jemalloc puts smaller allocations into the same arena, so would prevent the fragmentation caused by leaking smaller allocations, and potentially would be able to hide it completely for a while. The fact that the spike is consistently 2.25x the previous max RSS is too specific not to be the growth factor on something. I think it is jemalloc growing one of its larger arenas, but it could also be the size of some internal buffer used by civet or something else. I believe that the behavior we're seeing with jemalloc 5 isn't actually different overall from what we saw with jemalloc 4, other than the spike happening sooner. I think the spike we very rarely saw in jemalloc 4 would consistently have occurred if the server didn't restart every 24 hours. I suspect switching from civet to hyper will remove this issue entirely. If it doesn't, I intend to dig in further. The next steps would be to log more from jemalloc on a regular interval, get some stack traces from jemalloc for large allocations, and possibly even running valgrind in production for brief periods. Most of these things will require a lot of work and have a very low signal to noise ratio, which is why I've chosen to stop here at a point where I'm confident we don't have an impending outage, and rule a few more things out before digging deeper. Other things that I have done which are not in this PR or reflected in the code base are: - Enabling more verbose logging on Heroku's side - This gives us more granularity than the metrics graph gives us (every 20 seconds, and also separated by server) - Additional stats beyond the metrics graphs (disk cache, pages written to disk) - Reached out to support to see if we can disable the daily restart - I don't intend to do this right now if it's possible, but it's a good option to have to aid us in debugging this and the connection pooling issue. - It's not something they offer out of the box, but the platform team is looking into making it possible for us
☀️ Test successful - checks-travis |
Yeah sorry, the "this is in prod because that was the only way to debug
this" aspect makes it rather awkward XD
…On Wed, Feb 13, 2019, 4:44 PM bors ***@***.***> wrote:
Merged #1623 <#1623> into
master.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1623 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABdWK4E1Jbw01uCKHeKYS0FMYveVL13jks5vNED6gaJpZM4a2IQc>
.
|
Short version:
We have a memory issue after switching to 1.32 which came from the
allocator switch. 1.31 used jemalloc 4. jemallocator uses jemalloc 5,
which behaves differently. With jemalloc 5 as configured in the PR, we
still have an issue which needs to be addressed eventually, but the
behavior is improved to the point where I'm not concerned about it
causing a service disruption. My current theory points to the problem
coming from civet, which we're removing soon anyway. Looking into this
further will require a lot of time and effort, so I'd like to see if we
have a major improvement from removing civet before going further. I
believe that the behavior that we're seeing with this config is the same
as a rare behavior we saw on Rust 1.31, so I do not think this is any
riskier than locking back down to 1.31. This is also what has been in
production for the last several days as we've been debugging.
The
let _
when enabling background threads is that they're notsupported on all platforms (notably mac). The function is safe to call
on unsupported platforms. Folks who are more familiar with jemalloc than
I am recommended this over
#[cfg(not(macos))]
.This PR also locks down the Rust version so we don't unexpectedly have
issues from a version upgrade mixed with code changes again, or an
unexpected full build with no cache during an incident. I expect that we
will just bump this immediately when a new stable version comes out, but
we should make sure we always deploy right after bumping it.
Everything after this is more detail on what I've said above, and/or
documenting what I've found so it doesn't get lost.
Long version:
After upgrading to Rust 1.32, we saw what appeared to be a large memory
leak. I've been spending a lot of time over the last week attempting to
narrow this down, as well as come up with a permanent fix.
The fact that the 1.32 upgrade caused this makes sense, as it changed
the default allocator from jemalloc 4 to the system allocator. What
doesn't make sense is why the system allocator would cause us to have a
major memory leak, but jemalloc 4 causes us to have completely stable
usage, and jemalloc 5 does something completely different (more on that
later)
From looking into this we've learned a few interesting things:
theory should cause major perf issues, but in practice it barely
caused a noticable effect on response times
haven't had time to look into it in detail until now. I've always
assumed it was nginx's buffers, but this makes it clear that most
or all of our memory usage is in fact coming from the server.
traffic, as we haven't been able to reproduce locally or in staging.
dependencies which I'll elaborate on below
we're nowhere near max memory usage. My understanding is this should
not be happening.
the only time I'd expect us to be allocating a ton of memory. The
spikes sometimes come from uploads, but not always. And the amount
that gets allocated is way larger than the size of the crate being
uploaded (0.18MB crate correlated with a 113MB increase in RSS)
Here's the behavior we've seen with various allocators:
200MB and 350MB) and remains extremely stable at that number until
the server restarts. The number it grows to is different every
restart, but whatever number it reaches is the same between both
servers.
before restart. I've suspected in the past this is related to the
connection pool exhaustion issue, since IIRC the memory usage spiked
the last time we had that issue occur in production. The spike only
affects a single server when it happens.
RAM within 3 hours and continuing to grow into swap beyond that. I
do not know if it eventually reaches an upper bound as I've never
let it run for long after it begins swapping
threads)
grows towards ~200MB. Memory usage is similar between servers, but
not exactly the same. "stable" in this context means not obviously
leaking or growing in unreasonable ways, but it moves around much
more than jemalloc 4 did (25MB or so vs <5MB)
grow at least once
occasionally both servers to some number (usually between 400MB and
520MB) and stays there until server restart.
the growth. And by roughly I mean within 1-2% of that number, and
it extremely consistently exhibits this behavior.
pretty clear that it's a leak in memory allocated by one of our C
dependencies. We have quite a few of these, but the ones that would
make the most sense to be the source are either civet, miniz (coming
from flate2), libgit2, or libpq (coming from Diesel)
brief code audit as well, so I don't think it's libpq
once we've moved index updates off the web server there's very
little reason we can't just subshell out to git
intending to remove
not correlated with a time of day, so I don't think it was triggered
by specific traffic patterns
there, with one notable exception of the first day this was deployed
where it grew to consume all available memory, then randomly decided
to allocate 116MB of swap, and shortly afterwards it freed that
amount of RSS. At the time I thought the fact that it stopped at
almost exactly 512MB of RAM was too specific to be a coincidence but
I now believe that it was a coincidence
jemalloc_ctl
(jemalloc 5, unprefixed,background threads enabled)
22-23 hours for the spike to occur
1.31
My current working theory is that we do actually have a memory leak, but
what is being leaked is something extremely small. The reason this
causes such a huge difference in behavior on the system allocator is
that these small leaked allocations are causing fragmentation issues.
For this to be the case, it would require that we are slowly requesting
larger and larger allocations over time, which is what leads me to
believe that civet is the cause.
I suspect the growth is from civet growing its request/response buffers
(which should be the largest single allocations we have), but the leak
is something very small that is just causing the allocator to be unable
to re-use previously freed blocks. It's also possible that these
re-allocations are just causing this much fragmentation naturally
without the leak, but I have a hard time believing the system allocator
is actually performing that much worse here. The slow consistent
memory growth before the huge spike also implies the small leak to me.
This would also explain some of the behavior of jemalloc, and why we're
seeing that spike. jemalloc puts smaller allocations into the same
arena, so would prevent the fragmentation caused by leaking smaller
allocations, and potentially would be able to hide it completely for a
while. The fact that the spike is consistently 2.25x the previous max
RSS is too specific not to be the growth factor on something. I think it
is jemalloc growing one of its larger arenas, but it could also be the
size of some internal buffer used by civet or something else.
I believe that the behavior we're seeing with jemalloc 5 isn't actually
different overall from what we saw with jemalloc 4, other than the spike
happening sooner. I think the spike we very rarely saw in jemalloc 4
would consistently have occurred if the server didn't restart every 24
hours.
I suspect switching from civet to hyper will remove this issue entirely.
If it doesn't, I intend to dig in further. The next steps would be to
log more from jemalloc on a regular interval, get some stack traces from
jemalloc for large allocations, and possibly even running valgrind in
production for brief periods. Most of these things will require a lot of
work and have a very low signal to noise ratio, which is why I've chosen
to stop here at a point where I'm confident we don't have an impending
outage, and rule a few more things out before digging deeper.
Other things that I have done which are not in this PR or reflected in
the code base are:
(every 20 seconds, and also separated by server)
written to disk)
good option to have to aid us in debugging this and the connection
pooling issue.
is looking into making it possible for us