-
-
Notifications
You must be signed in to change notification settings - Fork 267
Any update on indefinite hangs in destructor? #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In my case a similar problem happens, when my instance is unable to connect to a broker instance (while testing I'm simply stopping kafka docker container). This causes Producer to retry sending the message over and over, locking up php-fpm in the process. For normal cases when Kafka is online, it seems that after message is acknowledged process is freed properly. |
I think that the problem lies in an "infinite" loop inside The loop looks similar to the one in librdkafka's I think that polling in the destructor is not the best solution and it would be better to expose the Currently, as a workaround I added custom public function flush(int $timeoutInMilliseconds): void
{
// Make at least one non-blocking poll
$this->vendorProducer->poll(0);
$endTime = microtime(true) + $timeoutInMilliseconds / 1000;
$remainingTimeInMilliseconds = $timeoutInMilliseconds;
while ($this->vendorProducer->getOutQLen() > 0 && $remainingTimeInMilliseconds > 0) {
$this->vendorProducer->poll($remainingTimeInMilliseconds);
$remainingTimeInMilliseconds = (int) (($endTime - microtime(true)) * 1000);
}
if ($this->vendorProducer->getOutQLen() > 0) {
throw new \RuntimeException('Flush timed out');
}
} Then I can call the |
Thanks for the analysis @sustmi The current behavior (poll until the queue is empty) is safe by default, as it will attempt to send the messages until they are actually sent. I think that this is a good behavior to retain. Though I agree that there should be a way to quit earlier. If we expose
WDYT ? |
@arnaud-lb building on what was previously mentioned, i think it would be important, that at least all the callbacks that can be triggered from poll, can be executed still (so the application could handle errors). If this is not possible, i would actually get rid of the poll in the destructor and adjust all the examples, and really stress that we need to do flush and purge for a proper shutdown. |
Unfortunately, there is no guarantee that callbacks are going to be called during destruction, because the callbacks themselves might have been destroyed by the GC already at this point. Here are options I can think of: Option 1: Don't poll in the destructor, and let users poll/flush manually. If the user forgets to do so, messages are lost. |
@arnaud-lb in my opinion we should go with Option 1 (we could always adapt if we see, that too many people have problems and add Option 2 again). My reason for option one is, if we don't poll / flush, we can actually anyway never be sure if poll will trigger a callback that is unhandled, which actually then could result in lost messages as well. So i think it is not the extensions responsibility to take care of this, since it is very hard to acomplish in a good way. |
Simple solutions are probably best. This option gives programmer-user the most control in my opinion and should be preferred. It is explicit and shows exactly what is happening. Current behavior is difficult to explain to newcomers, since it involves "hidden" operations that may or may not occur (message is not able to be delivered). Actually, I believe it will prevent more issues than it might introduce, since what usually happens is programmer-user does no error checking and is eventually surprised when inevitable connectivity issues occur and PHP processes start to chog, retrying until destroyed and/or lose messages. Enforcing proper error handling in user-land is the way to go for me. |
Agreed, the current behavior is confusing. Let's go for option 1. We have to document this change, and release this in a new major version, though. |
Hi,
is there an update on the indefinite hangs in destructor when no brokers are available ?
I tried several options but can't seem to get over this issue..
it's affecting my production environment atm when kafka is restarting or all the brokers are down
you can retry it very easily with following code:
Afterwards rdkafka will be in a destructor loop :
Thanks!
Regards
J
The text was updated successfully, but these errors were encountered: