Skip to content

multithreaded sessions engineering #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alex116 opened this issue Dec 19, 2012 · 5 comments
Closed

multithreaded sessions engineering #169

alex116 opened this issue Dec 19, 2012 · 5 comments

Comments

@alex116
Copy link

alex116 commented Dec 19, 2012

Not sure if this is the right place to ask this:

I'm using the multi-threaded server example with the producer/consumer model for thread management and have built on that.
I created a Session class that has multiple connections associated with it based on login data but am having problems when the on_close handler is called and Requests are still being processed which use the connection which is closed.
I'm looking for a way to solve this and was wondering if you could give me either a recommendation for literature or perhaps way to solve this problem.

@zaphoyd
Copy link
Owner

zaphoyd commented Dec 19, 2012

Can you help me understand exactly what the questions is?

What I think is happening based on your description: Each connection performs some sort of authentication/login step that results in it being associated with a Session. A Session can have multiple connections. Requests are processed and sent to all connections in the associated Session? The issue happens when a connection is closed by the remote endpoint after a request is received but before the result is written out to the socket.

Is the issue that connections correctly close and you are unsure how to clean up or avoid sending to closed connections? Or are connections closing before you think they should be? I.e you would expect request -> response -> close but get request -> close -> response due to thread scheduling? Is this an intermittent problem or is it consistently reproducible?

@alex116
Copy link
Author

alex116 commented Dec 19, 2012

hello zaphoyd, thanks for taking the time to reply to my question. You have correctly assessed the problem although I am only sending data to those connections that have sent a request to be answered. I believe your implementation is working great but I am unsure how to avoid sending to closed connections. The connection is closed by the remote endpoint (pressing f5 in a browser that connects to the server via javascript) after a request is received but before the result is written out to the socket. This causes a segmentation fault when trying to use the connection_ptr.
I am thinking of testing if( connection_ptr->get_state() == OPEN){ ///send data } but wouldn't it mean its not thread safe? what if the connection is closed after testing the get_state()?
Edit: I am stress testing my web application and am consistently coming across this issue when repeatedly pressing f5 at different speeds (manually)

@zaphoyd
Copy link
Owner

zaphoyd commented Dec 19, 2012

Okay. You've hit on a point where the 0.2.x version of this library has some known issues. The library wasn't designed to be thread safe and the retrofitting that I have attempted to do has some problems, as you have noticed. That said, there should be a guard even in 0.2.x that should silently ignore sends to connections that are closing or closed. If you still have a connection_ptr, that should be a reference counted shared pointer that keeps the dead connection around even after the library has forgotten about it. This should mean that it is always safe to call the send method of a connection_ptr from anywhere.

Can you give me a little more information about the segfault you are getting? In particular, what thread is send being called from and how did that thread get its copy of the connection_ptr. Is it being called from inside a library handler or was it passed out to another worker thread?

Also: If you are doing serious multithreaded work, you should look for the 0.3 preview release later this month. It has been reworked to provide much more robust thread safety.

@alex116
Copy link
Author

alex116 commented Dec 19, 2012

Some detail to how send is being called:
I'm using the on_message handler and creating a Request. I am assigning the connection_ptr to a local variable in Request using the "=" operator and then placing the Request on the request_coordinator's queue. A worker thread (process_request) takes the Request from the queue and processes it.
Processing the Request generates a reply. To send the reply back, I generate more Requests which have a bool to indicate that they are replies and should be sent to the remote endpoint and should not be locally processed. The Requests are placed back on the queue for processing by worker threads. This is so that one worker thread doesn't have to spend time sending data sequentially to all connections. Hopefully this results in the data being sent out faster through the work of many worker threads.

the segfault appears when I try to copy the local copy of the connection_ptr in the current Request to the Request that is being used to build the reply.

In short, its a worker thread that got a copy of the connection_ptr through the on_message handler.

here's my gdb trace from windows 7 64bit compiling everything for 32bit and running it in the 64bit environment.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 6116.0x163c]
0x0059e2b9 in boost::detail::atomic_increment (pw=0x3f)
at D:/www/lib/platforms/windows/boost/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:66
(gdb) bt
#0 0x0059e2b9 in boost::detail::atomic_increment (pw=warning: (Internal error: pc 0x3f in read in psymtab, but not in symtab.)

0x3f)
at D:/www/lib/platforms/windows/boost/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:66
#1 0x0059e127 in boost::detail::sp_counted_base::add_ref_copy (this=warning: (Internal error: pc 0x3b in read in psymt
b, but not in symtab.)

0x3b)
at D:/www/lib/platforms/windows/boost/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:133
#2 0x0059c770 in boost::detail::shared_count::shared_count (this=0xaaef898, r=...)
at D:/www/lib/platforms/windows/boost/include/boost/smart_ptr/detail/shared_count.hpp:316
#3 0x00512169 in boost::shared_ptr<websocketpp::connection<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger>, websocketpp::role::server<websocketpp::endpoint<websocketpp::role::server,
websocketpp::socket::plain, websocketpp::log::logger> >::connection, websocketpp::socket::plain<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger> >::connection> >::shared_ptr (this=0xaaef894, r=...) at D:/www/lib/platforms/windows/boost/include/boost/smart_ptr/shared_ptr.hpp:206
#4 0x004174ad in CConnection::send (this=0x684cd78, session=0x684a9a0, data=...)
at ..\src\modules\webserver\src\CConnection.cpp:67
..... more frames of my code ......

If I call get_state on the connection instead of copying it over, I get the following backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 7420.0x3c8]
0x761d13c8 in KERNEL32!GetPrivateProfileStructA () from C:\Windows\syswow64\kernel32.dll
(gdb) bt
#0 0x761d13c8 in KERNEL32!GetPrivateProfileStructA () from C:\Windows\syswow64\kernel32.dll
#1 0x005a2cc3 in boost::detail::interlocked_read_acquire (x=0x223)
at D:/www/lib/platforms/windows/boost/include/boost/thread/win32/interlocked_read.hpp:59
#2 0x005a2d2b in boost::detail::basic_recursive_mutex_implboost::detail::basic_timed_mutex::try_recursive_lock (
this=warning: (Internal error: pc 0x21f in read in psymtab, but not in symtab.)
0x21f, current_thread_id=968)
at D:/www/lib/platforms/windows/boost/include/boost/thread/win32/basic_recursive_mutex.hpp:98
#3 0x005a2d77 in boost::detail::basic_recursive_mutex_implboost::detail::basic_timed_mutex::lock (this=warning: (Internal error: pc 0x21f in read in psymtab, but not in symtab.)
0x21f)
at D:/www/lib/platforms/windows/boost/include/boost/thread/win32/basic_recursive_mutex.hpp:54
#4 0x00510db5 in boost::lock_guardboost::recursive_mutex::lock_guard (this=0xa86f74c, m_=...)
at D:/www/lib/platforms/windows/boost/include/boost/thread/locks.hpp:264
#5 0x005d7675 in websocketpp::connection<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger>, websocketpp::role::server<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger> >::connection, websocketpp::socket::plain<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger> >::connection>::get_state (this=warning: (Internal error: pc
0x3b in read in psymtab, but not in symtab.)
0x3b)
at D:/www/lib/platforms/windows/websocketpp/include/connection.hpp:181
#6 0x004174ae in CConnection::send (this=0x9036cf0, session=0x9034378, data=...)
at ..\src\modules\webserver\src\CConnection.cpp:64

If I just call connection_ptr->send instead of trying to construct a Request and put it on the queue, I get this backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 6872.0x98c]
0x761d13c8 in KERNEL32!GetPrivateProfileStructA () from C:\Windows\syswow64\kernel32.dll
(gdb) bt
#0 0x761d13c8 in KERNEL32!GetPrivateProfileStructA () from C:\Windows\syswow64\kernel32.dll
#1 0x005a2b43 in boost::detail::interlocked_read_acquire (x=0x1f7)
at D:/www/lib/platforms/windows/boost/include/boost/thread/win32/interlocked_read.hpp:59
#2 0x005a2bab in boost::detail::basic_recursive_mutex_implboost::detail::basic_timed_mutex::try_recursive_lock (
this=warning: (Internal error: pc 0x1f3 in read in psymtab, but not in symtab.)
0x1f3, current_thread_id=2444)
at D:/www/lib/platforms/windows/boost/include/boost/thread/win32/basic_recursive_mutex.hpp:98
#3 0x005a2bf7 in boost::detail::basic_recursive_mutex_implboost::detail::basic_timed_mutex::lock (this=warning: (Internal error: pc 0x1f3 in read in psymtab, but not in symtab.)

0x1f3)
at D:/www/lib/platforms/windows/boost/include/boost/thread/win32/basic_recursive_mutex.hpp:54
#4 0x00510c35 in boost::lock_guardboost::recursive_mutex::lock_guard (this=0xab0f854, m_=...)
at D:/www/lib/platforms/windows/boost/include/boost/thread/locks.hpp:264
#5 0x004fb018 in websocketpp::connection<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger>, websocketpp::role::server<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger> >::connection, websocketpp::socket::plain<websocketpp::endpoint<websocketpp::role::server, websocketpp::socket::plain, websocketpp::log::logger> >::connection>::send (this=warning: (Internal error: pc 0xf
in read in psymtab, but not in symtab.)
0xf, payload=...,
op=websocketpp::frame::opcode::TEXT)
at D:/www/lib/platforms/windows/websocketpp/include/connection.hpp:1549
#6 0x00417518 in CConnection::send (this=0x884af70, session=0x8848980, data=...)

I got this websocket version from this github site on October 12 as a zip file. Boost is 1.51. Everything is compiled using -std=c++0x -m32 with mingw and gcc 4.7.2 win32 from: http://sourceforge.net/projects/mingwbuilds/files/host-windows/releases/4.7.2/32-bit/threads-win32/sjlj/

Edit: I just cloned the master branch of websocketpp and tried it again but the same errors occur.

@zaphoyd
Copy link
Owner

zaphoyd commented Mar 25, 2014

Closing this as 0.2.x has known multithreading issues that won't be fixed. Solution is to use 0.3.x if multithreading is important.

@zaphoyd zaphoyd closed this as completed Mar 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants