Skip to content

rptest: fix max_connections swarm sizing for librdkafka >= 2.10.0#30848

Merged
travisdowns merged 1 commit into
devfrom
td-omb-max-connections-swarm-sizing
Jun 18, 2026
Merged

rptest: fix max_connections swarm sizing for librdkafka >= 2.10.0#30848
travisdowns merged 1 commit into
devfrom
td-omb-max-connections-swarm-sizing

Conversation

@travisdowns

@travisdowns travisdowns commented Jun 18, 2026

Copy link
Copy Markdown
Member

OMBValidationTest.test_max_connections sizes the swarm of producers from an
assumption about how many connections each swarm producer holds. It assumed
num_brokers + 1, where the + 1 was a persistent connection to the bootstrap
broker.

librdkafka removed that separate bootstrap-broker connection in v2.10.0
(confluentinc/librdkafka#4557): brokers are now keyed by id rather than
host:port, so the bootstrap entry is merged into the learned broker list and a
producer ends up holding exactly one connection per broker. client-swarm now
bundles librdkafka >= 2.10.0, so the old + 1 over-estimated the per-producer
connection count and the swarm was provisioned with too few producers to ever
reach the advertised connection target, causing the test to fail with e.g.
Failed to reach target connections, actual: ~18560, target: 24723.

This drops the stale + 1 so conn_per_swarm_producer == num_brokers, which
re-sizes producer_per_swarm_node upward and lets the swarm actually reach the
target connection count.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

Copilot AI review requested due to automatic review settings June 18, 2026 20:53
The test sized the swarm assuming each producer holds num_brokers + 1
connections, where the +1 was the persistent bootstrap-broker connection.
librdkafka removed that connection in v2.10.0 (confluentinc/librdkafka#4557)
by keying brokers on id rather than host:port. client-swarm now bundles
librdkafka >= 2.10.0, so each producer holds exactly num_brokers
connections, leaving the swarm under-provisioned and unable to reach the
advertised connection target.

CORE-16659

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the OMBValidationTest.test_max_connections swarm sizing heuristic to reflect librdkafka behavior changes (>= 2.10.0) where producers no longer maintain an extra persistent bootstrap-broker connection, preventing under-provisioned swarms and connection-target shortfalls in the max-connections cloud validation test.

Changes:

  • Adjusts the assumed per-producer connection count from num_brokers + 1 to num_brokers.
  • Expands the inline comment to document the historical bootstrap connection and the librdkafka v2.10.0 change that removed it.

@ballard26 ballard26 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I assume we're only going to be using this with librdkafka versions >= 2.10 right?

@travisdowns

Copy link
Copy Markdown
Member Author

LGTM, I assume we're only going to be using this with librdkafka versions >= 2.10 right?

Basically yes, since the swarm version is pinned here in this same repo in ducktape-deps, it was upgraded in #30671 earlier this month.

It got through CI since these don't run in PRs.

@vbotbuildovich

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#86005
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkingReplicationTests test_auto_prefix_trimming {"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": false} integration https://buildkite.com/redpanda/redpanda/builds/86005#019edc8b-1887-4806-9b08-54febea33d62 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0323, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS) ClusterQuotaPartitionMutationTest test_partition_throttle_mechanism null integration https://buildkite.com/redpanda/redpanda/builds/86005#019edc8a-b2eb-4be4-80f7-60e2359dacbd 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0061, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ClusterQuotaPartitionMutationTest&test_method=test_partition_throttle_mechanism
FLAKY(PASS) NodeWiseRecoveryTest test_recovery_local_data_missing {"wait_for_final_manifest_uploads": false} integration https://buildkite.com/redpanda/redpanda/builds/86005#019edc8a-b2eb-4be4-80f7-60e2359dacbd 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0416, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1197, p1=0.2795, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodeWiseRecoveryTest&test_method=test_recovery_local_data_missing

@travisdowns travisdowns merged commit a959bb0 into dev Jun 18, 2026
18 checks passed
@travisdowns travisdowns deleted the td-omb-max-connections-swarm-sizing branch June 18, 2026 22:14
@StephanDollberg

Copy link
Copy Markdown
Member

BIG. This also means one less thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants