-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Supervisord crashes when over 1023 files are open (even with ulimit set) #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I threw together a quick hack that emulates select() with poll(). I don't recommend trusting your production boxes with my skin-deep knowledge of poll(), but it does withstand the cat-bomb as well as pass all tests. |
The above hack improperly handled POLLPRI, and it's fixed in in the head of my fork |
Thanks for the quick attention, sproingie! |
Since renaming my repo (owing to some incompatible changes I'm making), I can't seem to nail down that changeset anymore, but if you're trying it, you'll want to grab the code out of the head revision. I recall making at least one extra change, namely multiplying the timeout by 1000. Turns out that select.select specifies the timeout in seconds, whereas for select.poll it's in milliseconds. Oops. The tiny timeout caused a lot of spinning and probably even some livelock. It's still a hack job, since the proper design would be to keep the poll object around persistently and not try to make it emulate the statelessness of select(). |
FWIW, I think there is a way to compile a Python (involving some FD_SETSIZE hackery IIRC) that allows for more file descriptors to be accessible by select(). Googling doesn't lead to any obvious URLs however. |
Running into the error in the initial post. Any advice on how to deal with it? |
Currently the only workaround is to compile a Python that supports > 1024 file descriptors and run supervisor under that. |
Any idea how to do that? Google hasn't been helpful. I have the python 2.7.2 source extracted and ready to go. |
Nope. As I said in a previous entry, I could not find a suitable Google entry. Likely have to either ask on python newsgroup or stackoverflow. |
Shouldn't supervisor just switch from select.select to select.poll ? By my math (5 fds per child), this restricts supervisor to about 204 processes, actually fewer if you substract stdin/stdout/stderr, listeners for rpc/http, and whatever shlibs python has upon. So maybe 200 or 201. For the time being, we are probably going to cope with this by running two instances of supervisord and splitting our workload among them. |
I am taking a whack at fixing this myself, since Chuck Adams can't seem to find his change set: https://github.com/linuxtampa/supervisor |
We just ran into this issue in production today. Not sure what our interim solution will be. Running two supervisors would be awkward. |
I looked into maybe trying to implement the mainloop in terms of select.poll, but it doesn't appear to work on Mac OS X, or at least the out-of-the-box Python builds on Mac OS X don't support it: http://bugs.python.org/issue5154 Bleh. |
I tried making that change too... it seemed to work for the first day, but https://github.com/linuxtampa/supervisor tlj On Sat, Jan 14, 2012 at 4:38 AM, Chris McDonough <
|
Hi, I've started working to replace select() for poll(), my fork is on: https://github.com/igorsobreira/supervisor/commits/master
I would love some feedback, and please let me know if i'm on the wrong track. |
I've sent an email with updates: http://lists.supervisord.org/pipermail/supervisor-users/2012-March/001036.html |
See also #145 which sounds like it is also caused by this issue. |
What is the status with this issue? @igorsobreira what about your pull request? |
@weissi my pull request has a working solution, I mean there are no more features I had in mind that were needed. But two issues were reported on the pull request, maybe it's the same (see the comments), I didn't have time to dive into those yet. I plan to investigate this hand on linux this weekend. Anyway, needs more testing, and maybe an update to supervisor master. |
Cool, thank you! |
Any possible hope of this being addressed? :( |
Our organization is hitting this same bug too. It's a pretty big deal. |
Same bug. Thanks to @igorsobreira, his version work at me fine. |
Just installed 3.0 and hitting this issue. Is there a plan to resolve this? |
I met same bug. Thanks to @igorsobreira. your work is very cool. |
Supervisord uses select.select to monitor filehandles related to the processes it supervises. This is problematic because select.select raises a ValueError for filehandles numbered >1023. (Observed with supervisor 3.0a8 on an Ubuntu Gnu/Linux 11.04 amd64 machine.)
We ran into this problem when running approximately 254 supervised processes. Initially, we assumed it was a ulimit configuration problem, but found that the crash occurred even when running supervisord in non-daemon mode. I've been able to reproduce the stacktrace by supervising a large number of /bin/cat processes, and have included it below. Here's a conf file
to run 1100 cats:
https://gist.github.com/1068713
To reproduce this bug, just install that config file and run something like:
sudo bash -c "ulimit -n 10000; supervisord -n"
You'll see a ValueError (out of range) from select.select(), called from supervisord's runforever():
https://github.com/Supervisor/supervisor/blob/master/supervisor/supervisord.py#L218
It appears this is a limitation of Python's select() function, which raises a ValueError on file descriptors > 1023. I've seen some suggestions that beyond this limit, one should use poll() instead of select(), but I'm not an expert.
FULL TRACEBACK:
Traceback (most recent call last):
File "/usr/bin/supervisord", line 9, in
load_entry_point('supervisor==3.0a8', 'console_scripts', 'supervisord')()
File "/usr/lib/pymodules/python2.7/supervisor/supervisord.py", line 371, in main
go(options)
File "/usr/lib/pymodules/python2.7/supervisor/supervisord.py", line 381, in go
d.main()
File "/usr/lib/pymodules/python2.7/supervisor/supervisord.py", line 94, in main
self.run()
File "/usr/lib/pymodules/python2.7/supervisor/supervisord.py", line 111, in run
self.runforever()
File "/usr/lib/pymodules/python2.7/supervisor/supervisord.py", line 229, in runforever
r, w, x = self.options.select(r, w, x, timeout)
File "/usr/lib/pymodules/python2.7/supervisor/options.py", line 1097, in select
return select.select(r, w, x, timeout)
ValueError: filedescriptor out of range in select()
The text was updated successfully, but these errors were encountered: