Skip to content

[Bug] Creating producer or consumer doesn't retry for temporarily errors #391

Closed
@BewareMyPower

Description

@BewareMyPower

Search before asking

  • I searched in the issues and found nothing similar.

Version

Pulsar: 3.0.1.4
C++ Client: 3.4.2

Minimal reproduce step

It happens in a stress test.

What did you expect to see?

When the broker is temporarily unavailable, e.g. SSL handshake failed, the client should retry creating producers or consumers.

What did you see instead?

There are a lot ResultConnectError errors in createProducerAsync with many Handshake failed: stream truncated and Connection closed with ConnectError logs.

2024-01-25T00:23:02.223Z E [<local_ip>:53420 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.223Z E [<local_ip>:53420 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.287Z E [<local_ip>:53488 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.288Z E [<local_ip>:53488 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.323Z E [<local_ip>:53538 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.323Z E [<local_ip>:53538 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.430Z E [<local_ip>:53730 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.430Z E [<local_ip>:53730 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.485Z E [<local_ip>:53798 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.485Z E [<local_ip>:53798 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.521Z E [<local_ip>:53886 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.521Z E [<local_ip>:53886 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.697Z E [<local_ip>:54094 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.698Z E [<local_ip>:54094 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.812Z E [<local_ip>:54280 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.812Z E [<local_ip>:54280 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.824Z W Error creating topic producer for <topic-1>: 5
2024-01-25T00:23:02.824Z E [<local_ip>:54350 -> <remote_ip>:6651] Connection closed with ConnectError (refCnt: 2)
2024-01-25T00:23:02.824Z E [<local_ip>:54350 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.838Z E [<local_ip>:54386 -> <remote_ip>:6651] Handshake failed: stream truncated
2024-01-25T00:23:02.824Z W Error creating topic producer for <topic-2>: 5

Error code 5 means ResultConnectError.

Anything else?

It's because when handshake failed, the ClientConnection will close with ResultConnectError (by default)

void ClientConnection::handleHandshake(const ASIO_ERROR& err) {
if (err) {
LOG_ERROR(cnxString_ << "Handshake failed: " << err.message());
close();
return;

Then ProducerImpl::connectionFailed will be called with ResultConnectError, if the producer didn't complete the creation, it will immediately fail with that Result

} else if (producerCreatedPromise_.setFailed(result)) {

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions