Skip to content

memory issues #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
IPv6 opened this issue Aug 30, 2012 · 16 comments
Closed

memory issues #157

IPv6 opened this issue Aug 30, 2012 · 16 comments

Comments

@IPv6
Copy link

IPv6 commented Aug 30, 2012

Hello!

Hit one problem: after upgrading Openresty to 1.2.1.14 (from 1.0.**) nginx started to "grow" in memory while serving lua-based calls. In change log i found that lua code execution slightly changed with this version so it seems to be related. where i can read how this change affected memory usage in Lua?

i am using global vars (keeping in mind that they are per-worker, its ok for me), but simply counting thier size i can see that this is not a reason... seems like big chunks of objects are stalled in lua somehow.

BTW, it would be nice to have method to get memory size used by lua in nginx. just to be sure and control this parameter

Any advice is welcome. And thanks for great tool :)

@agentzh
Copy link
Member

agentzh commented Aug 30, 2012

Hello!

On Thu, Aug 30, 2012 at 1:39 AM, Ilja Razinkov wrote:

Hit one problem: after upgrading Openresty to 1.2.1.14 (from 1.0.**) nginx started to "grow" in memory while serving lua-based calls.

Is it keeping growing infinitely or just use a constant amount of more
memory than before?

Were you using 1.0.15.10? Could you confirm that?

In change log i found that lua code execution slightly changed with this version so it seems to be related. where i can read how this change affected memory usage in Lua?

There's one change that _G is no longer shared among all the requests
when it is being used in the code chunks directly created by those
_by_lua config directives.

i am using global vars (keeping in mind that they are per-worker, its ok for me), but simply counting thier size i can see that this is not a reason... seems like big chunks of objects are stalled in lua somehow.

What kind of Lua globals are you using? Are you using _G in the code
chunks directly? Or just using Lua module-level globals?

BTW, it would be nice to have method to get memory size used by lua in nginx. just to be sure and control this parameter

Yes, you can get that from the standard Lua API, for example:

location = /gc {
    content_by_lua '
        ngx.say(string.format("Worker %d: GC size: %.3f KB",
                      ngx.var.pid, collectgarbage("count")))
    ';
}

And then accessing this /gc interface will give you something like this:

$ curl localhost:1984/gc
Worker 31980: GC size: 42.924 KB

Any advice is welcome. And thanks for great tool :)

Thank you for your nice words :) If it's not the issue with the _G
change mentioned above, could you please try minimizing your config
and Lua code while keeping the issue reproducible? It'll be great if I
can reproduce this on my side :) It that still fails for you, I'll try
to provide you more tools for you to analyse the details of memory
usage in both Nginx space and the Lua space on your side.

Best regards,
-agentzh

@IPv6
Copy link
Author

IPv6 commented Aug 31, 2012

Thanks, i`ll added collectgarbage("count") to controlled parameters and will look for correlation. This will narrow problem at least!!

  • i upgraded from version 1.0.15.10, you are right. Is there something specific?
  • it's keep growing but not with every request. server is under heavy load and mem usage jumps up significantly after certain amount of time (i sampling server stats every 2 hours). may be when it hits specific stuff in lua code (but i checked and didn`t found suspicious places yet...). very rarely mem usage drops back (but still higher than "normal level", not correlated with server load)
  • i am not using _G directly, just lua module "locals". this modules are all loaded in newly introduced "init_by_lua" (in 1.0.15.10 they were loaded in warm-up request...)

p.s. is there something to look deeper (nginx patch/whatever)?

@agentzh
Copy link
Member

agentzh commented Aug 31, 2012

Hello!

On Fri, Aug 31, 2012 at 2:40 AM, Ilja Razinkov [email protected] wrote:

Thanks, i`ll added collectgarbage("count") to controlled parameters and will look for correlation. This will narrow problem at least!!

Please tell me the result when you have it ;)

i upgraded from version 1.0.15.10, you are right. Is there something specific?

There's actually a lot of changes in ngx_lua in ngx_openresty 1.2.1.14
since 1.0.15.10 (as you can see from the change logs). And I think you
really do not want to test all the development releases of
ngx_openresty in the middle.

its keep growing but not with every request. server is under heavy load and mem usage jumps up significantly after certain amount of time (i sampling server stats every 2 hours). may be when it hits specific stuff in lua code (but i checked and didnt found suspicious pplaces yet...). very rarely mem usage drops back (but still higer than "normal level", not correlated with server load)

Could you please give concrete numbers (for the memory usage) here?
How does it distribute among all the nginx worker processes? What are
the numbers for RSS, VIRT, and other memory related metrics?

i am not using _G directly, just lua module "locals". this modules are all loaded in newly introduced "init_by_lua" (in 1.0.15.10 they were loaded in warm-up request...)

That sounds fine :)

p.s. is there something to look deeper (nginx patch/whatever)?

What kind of operating system are you in? Linux? FreeBSD? Solaris? Or
something else?

Best regards,
-agentzh

@IPv6
Copy link
Author

IPv6 commented Sep 3, 2012

new data:

  • i am using debian
    Linux mail.pocketlistsapp.com 2.6.26-2-xen-amd64 编译错误 #1 SMP Wed Sep 21 05:57:38 UTC 2011 x86_64 GNU/Linux
    1Gb ram
  • lua garbage size not correlated with nginx memory hogging :(( i restricted nginx workers to only 1 to eliminate side effects and found that nginx mem growing independently. lua garbage floats up and down in reasonable amounts... seems lua garbage collector is ok and not the reason for my problem.

2012/09/02 13:26:09
lua_mem = "Worker 4764: GC size: 838.661 KB",
4764 nobody 20 0 74596 12m 2084 S 0 1.2 13:21.64 nginx

2012/09/02 15:31:09
lua_mem = "Worker 4764: GC size: 1033.789 KB",
4764 nobody 20 0 427m 367m 2076 S 0 36.0 15:33.96 nginx

Still there were no code changes before and after upgrade, this is why main suspect is upgrade itself :(
May be there is way to look into nginx internals and get info in which internal parts memory is really used?

@agentzh
Copy link
Member

agentzh commented Sep 3, 2012

Hello!

On Mon, Sep 3, 2012 at 2:16 AM, Ilja Razinkov [email protected] wrote:

new data:

i am using debian
Linux mail.pocketlistsapp.com 2.6.26-2-xen-amd64 #1 SMP Wed Sep 21 05:57:38 UTC 2011 x86_64 GNU/Linux
1Gb ram

lua garbage size not correlated with nginx memory hogging :(( i restricted nginx workers to only 1 to eliminate side effects and found that nginx mem growing independently. lua garbage floats up and down in reasonable amounts... seems lua garbage collector is ok and not the reason for my problem.

This information is quite important :)

Still there were no code changes before and after upgrade, this is why main suspect is upgrade itself :(
May be there is way to look into nginx internals and get info in which internal parts memory is really used?

Yes, please use the tools provided by my Nginx SystemTap Toolkit:

https://github.com/agentzh/nginx-systemtap-toolkit

I suggest you run these scripts in your test environment first because
I'm not 100% sure about the stability of SystemTap on your version of
Debian Linux :)

I'm looking forward to your new results and new findings!

Thanks!
-agentzh

@IPv6
Copy link
Author

IPv6 commented Sep 4, 2012

systemtap is not compiled for my linux distrib by default (2.6.26-2-xen-amd64) and seems i need to recompile kernel with some flags enabled for it to work, which is overkill... i try to switch to 1.0.15.10 to doublecheck what happen.

may be there is some nginx modules to grab internal info? for now i found HttpStubStatusModule but its minimalistic, just common stats are gathered...

thanks for help!

@agentzh
Copy link
Member

agentzh commented Sep 4, 2012

Hello!

On Tue, Sep 4, 2012 at 3:37 AM, Ilja Razinkov [email protected] wrote:

systemtap is not compiled for my linux distrib by default (2.6.26-2-xen-amd64) and seems i need to recompile kernel with some flags enabled for it to work, which is overkill...

Is it possible for you to upgrade your Linux kernel to 3.5.x? The
latest 3.5 series includes the inode-based uprobe API by default so
SystemTap should work out of the box.

Alternatively we'll have to wait for the upcoming SystemTap 2.0
release which will include another backend based on dyninst which is
based on ptrace and requires no new kernel features.

i try to switch to 1.0.15.10 to doublecheck what happen.

That's cool.

may be there is some nginx modules to grab internal info? for now i found HttpStubStatusModule but its minimalistic, just common stats are gathered...

I think we really really need dynamic tracing here. If you have a
working SystemTap (or DTrace on Solaris, FreeBSD, and Mac OS X), then
such leaks in nginx or other user applications will be much much
easier to debug.

Best regards,
-agentzh

@IPv6
Copy link
Author

IPv6 commented Sep 7, 2012

Hello!
Investigation continued :)
Advise needed!

Seems like memory hoggins can be produced by lua code still, i see memory spikes even on 1.0.15.10 (though they are much less noticable). Seems like this is happens after requests that do a lot of sequential requests to ngx.location.capture, specifically to Redis and Mysql (via drizzle modlue). Possibly when this requests are timed out en masse.

can it be the case? may be they are stacked somewhere inside request with delayed cleanup and this effect spans for a long time period?

@IPv6
Copy link
Author

IPv6 commented Sep 7, 2012

PS: i`ll try to upgrade to 3.5.x in future (and use systemtap), btw, if other attempts will fail...

@IPv6
Copy link
Author

IPv6 commented Sep 7, 2012

i made some test and i can see now that use of location.capture is eating up memory till the end of request and rarely mem usage didnot drops down even after request is finished.

test looks like this (just to get the idea):
local ngxRedis = import_wli("wli_pcapi.rob.ngx_redis")
local ngxMysql = import_wli("wli_pcapi.rob.ngx_mysql")
for i=1,5000 do
ngxRedis.direct_getStringKey("ASDASDASD"..i)
ngxMysql.execDbQuery_getFirstResult("select 1 from rob_allfields;")
end

both in old and the last version of openresty. i understand the reasons for this behaviour but for me this is dangerous thing - several heavy requests executed at the same time can "eat up" all the memory... adding collectgarbage("step") does not help much :(

is there the way to clean up location.capture internal stuff during request execution manually? seems like this would be a solution to all troubles :)

@agentzh
Copy link
Member

agentzh commented Sep 7, 2012

Hello!

On Fri, Sep 7, 2012 at 2:57 AM, Ilja Razinkov [email protected] wrote:

i made some test and i can see now that use of location.capture is eating up memory till the end of request and rarely mem usage didnot drops down even after request is finished.

Could you please confirm that the memory usage keeps rising without
limit? Because from the operating system's point of view, the memory
usage may not drop completely due to memory fragmentation.

Also, could you confirm that memory leaks happen when specific events
happen? Like upstream timeout? Could you please paste out the original
error messages in your nginx error log file?

test looks like this (just to get the idea):
local ngxRedis = import_wli("wli_pcapi.rob.ngx_redis")
local ngxMysql = import_wli("wli_pcapi.rob.ngx_mysql")
for i=1,5000 do
ngxRedis.direct_getStringKey("ASDASDASD"..i)
ngxMysql.execDbQuery_getFirstResult("select 1 from rob_allfields;")
end

both in old and the last version of openresty. i understand the reasons for this behaviour but for me this is dangerous thing - several heavy requests executed at the same time can "eat up" all the memory... adding collectgarbage("step") does not help much :(

The Nginx subrequests all share the same memory pool as the main
request. So many memory resources, especially small blocks below about
4KB, will not be freed until the time the pool of the main request is
destroyed.

So it is recommended to use the lua-resty-redis and lua-resty-mysql
libraries to access Redis and MySQL, respectively, when you have many
such upstream requests in a single nginx request:

https://github.com/agentzh/lua-resty-redis
https://github.com/agentzh/lua-resty-mysql

Both of these libraries are based on the ngx_lua cosocket API and do
not suffer from the memory pool sharing issues. And most of the time,
they use much less memory than the subrequest +
ngx_drizzle/ngx_srcache approach.

Also, they're included and enabled by default in your version of
ngx_openresty bundle :)

is there the way to clean up location.capture internal stuff during request execution manually? seems like this would be a solution to all troubles :)

No, not really. This is one of the limitations in the Nginx subrequest
design, and that's also one of the reasons for the existence of the
ngx_lua cosocket API :)

Best regards,
-agentzh

@agentzh
Copy link
Member

agentzh commented Sep 7, 2012

BTW, are you using LuaJIT 2.0 or the standard Lua 5.1 interpreter?
That is, have you specified the --with-luajit option while building
ngx_openresty?

Usually the LuaJIT uses much less memory than the standard Lua interpreter.

Best regards,
-agentzh

@IPv6
Copy link
Author

IPv6 commented Sep 12, 2012

Thanks a lot about lua-resty-redis and lua-resty-mysql. i ported our code on this modules instead of Redis2 and Drizzle and... problem is gone! So it was nginx subrequest memory management problem after all.

It`s a pitty this behaviour is not described anywhere :( i think it is not "expected by default" and backend servers are doomed to make a lot of calls in sequence, in potentially harmful way. besides that it would be helpful to note this side effect in drizzle/redis2 descriptions... since they are looked as "recommended" modules at openresty.org, where lua-resty-redis/lua-resty-mysql are just noted in change logs.
Or maybe i just overlooked such info :) anyway thank for great tool, again :)

@agentzh
Copy link
Member

agentzh commented Sep 12, 2012

Hello!

On Wed, Sep 12, 2012 at 2:14 AM, Ilja Razinkov [email protected] wrote:

Thanks a lot about lua-resty-redis and lua-resty-mysql. i ported our code on this modules instead of Redis2 and Drizzle and... problem is gone! So it was nginx subrequest memory management problem after all.

Good to know that! The lua-resty-* libraries indeed use much less
memory than the subrequest capturing approach.

But I'm not 100% sure if libdrizzle leaks here at some edge cases :)

It`s a pitty this behaviour is not described anywhere :( i think it is not "expected by default" and backend servers are doomed to make a lot of calls in sequence, in potentially harmful way.

besides that it would be helpful to note this side effect in drizzle/redis2 descriptions... since they are looked as "recommended" modules at openresty.org, where lua-resty-redis/lua-resty-mysql are just noted in change logs

The web site is lagging behind a bit because the cosocket thing and
all those various lua-resty-* libraries are kinda new (born in this
year). I'm sorry for the inconvenience.

Will you help me improve the openresty.org web site? The source
repository is here:

https://github.com/agentzh/openresty.org

You can fork and create a pull request or I can send you a commit bit
to my repository if you wish :)

anyway thank for great tool, again :)

Glad you like it :)

Thanks!
-agentzh

@IPv6
Copy link
Author

IPv6 commented Sep 13, 2012

thanks, i`ll try to help with docs, good stuff needs good docs :)

@agentzh
Copy link
Member

agentzh commented Sep 27, 2012

Consider it resolved.

@agentzh agentzh closed this as completed Sep 27, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants