Skip to content

Conversation

@zimatars
Copy link
Contributor

@zimatars zimatars commented Jan 6, 2026

This PR is for issue #4050 .

When using HTTP/2, concurrent requests should prefer multiplexing on an existing connection (until max concurrent streams is reached). There is a timing window in Http2Pool#drainLoop where a selected slot is temporarily not visible to other borrowers until the async deliver runs on the Channel EventLoop. During that window, concurrent acquires may allocate an extra connection.

Change: Reserve stream capacity and re-offer the selected slot before scheduling the async deliver, so concurrent borrowers can still see and reuse the connection.

@zimatars zimatars force-pushed the http2pool-acquire-window branch from 419ed9b to 867cc89 Compare January 6, 2026 16:41
@zimatars zimatars changed the title Improve Http2Pool concurrent acquire reuse (H2 multiplexing) Improve Http2Pool connection reuse for concurrent acquires Jan 6, 2026
@violetagg violetagg added this to the 1.3.3 milestone Jan 9, 2026
@violetagg violetagg added the type/enhancement A general enhancement label Jan 9, 2026
@violetagg
Copy link
Member

@zimatars Can you please rebase in order to pickup #4061, it is relevant to the changes in this PR.

@zimatars zimatars force-pushed the http2pool-acquire-window branch from 867cc89 to 9a1af17 Compare January 17, 2026 14:13
@zimatars
Copy link
Contributor Author

@violetagg I’ve rebased this PR on the latest main and included #4061. Could you please check it when you have a chance?

@zimatars zimatars force-pushed the http2pool-acquire-window branch from 9a1af17 to 6c758c4 Compare January 17, 2026 15:33
@violetagg
Copy link
Member

@zimatars I think we should apply this also to the branch when a connection is established. What do you think?

@zimatars
Copy link
Contributor Author

zimatars commented Jan 23, 2026

Hi @violetagg Thanks for the suggestion. I agree this should also cover the allocation path (when a connection is established).

On the “connection is established” path, changing deliver(...) alone doesn't seem sufficient. The allocation decision is made earlier in drainLoop(), so there is still an unavoidable gap between “permit granted / allocation in-flight” and “slot becomes visible in the queue”. A concurrent acquire that runs during this gap can observe findConnection(...) == null while permitGranted > idleSize (permit count includes in-flight allocations, while idle size is what is currently visible in the queue) and trigger another allocation.

Proposed approach (drainLoop allocation decision)

Instead of adding more logic to deliver(...) in the “new connection established” branch, I was thinking to adjust the allocation decision in drainLoop():

  1. Capture resourcesCount = idleSize before scanning the queue.
  2. Call findConnection(resources, resourcesCount) (or keep findConnection(resources) but use the captured resourcesCount for the decision).
  3. If no slot is found and permitGranted > resourcesCount, treat it as “allocations / deliveries are already in-flight (a connection exists but is temporarily not visible)” and do not allocate a new connection in this iteration. The next drain() will be triggered once the in-flight deliver/allocation completes.
  4. Otherwise, keep the existing behavior in the slot == null path unchanged (permit checks + allocate when needed).

Scenarios

  • Slot temporarily not visible: resourcesCount == 0 while permitGranted > 0 → a connection/slot exists, but it is temporarily not visible in the queue (selected by another borrower or allocation in-flight) → wait instead of allocating another connection.
  • No eligible slot in the queue: resourcesCount > 0 and permitGranted == resourcesCount but findConnection == null → all visible slots were filtered out by findConnection (e.g. reached max streams, GO_AWAY/eviction predicate/closed/inactive, etc.) and there is no in-flight allocation → proceed with allocating a new connection (subject to permits).
  • Connection is being established: permitGranted > resourcesCount → permit was granted, but the new connection/slot is not visible in the queue yet → avoid starting another allocation in parallel.

If you think this approach makes sense, I'm happy to update the PR accordingly.

@violetagg
Copy link
Member

@zimatars What do you think about pushing this as is and you will create a new PR for the connection established scenario?

@zimatars
Copy link
Contributor Author

@violetagg I was planning to use the same drainLoop() allocation decision adjustment to cover both the existing-connection visibility window and the “connection is established” window. With that, I think the extra deliver(...) changes in my commit can be reverted (keeping the main/#4061 deliver semantics unchanged) and we can keep the PR focused on the drainLoop() side. Would that be acceptable?

@violetagg
Copy link
Member

ok

@zimatars zimatars force-pushed the http2pool-acquire-window branch from 6c758c4 to 20b048f Compare January 25, 2026 16:03
@zimatars
Copy link
Contributor Author

@violetagg I’ve pushed an updated implementation in this PR. I reverted the earlier deliver(...) changes and instead adjusted the allocation decision in Http2Pool#drainLoop() so it covers both:

  • the selected-slot not visible window (slot temporarily removed from the queue until the async deliver runs), and
  • the new-connection established window (permit granted / allocation in-flight before the new slot becomes visible).

Changes in drainLoop()

In the slot == null branch, we detect “there is already an in-flight allocation/delivery” by comparing:

  • permitGranted (includes in-flight allocations), vs
  • resourcesCount (what is currently visible in the queue, captured from idleSize)

If permitGranted > resourcesCount, we treat it as “a connection/slot exists but is temporarily not visible”, so we don’t start another allocation in this drain iteration. The next drain() is triggered once the in-flight deliver/allocation completes.

Behavior differences

This makes HTTP/2 prefers multiplexing more consistently, but it changes a few edge behaviors:

  1. Pool from 0 → 1 (first connection):

    • Concurrent acquires will wait for the first connection to be established and delivered instead of racing to create multiple connections in parallel.
  2. Existing connection temporarily not visible (selected by another borrower, before async deliver):

    • Concurrent acquires will wait for the selected slot to be re-offered / delivered, instead of allocating an extra connection during the window.
  3. Max concurrent streams reached:

    • If all visible connections are at max streams, we allocate one new connection (subject to permits), then concurrent acquires wait for that new connection to become visible/delivered.
    • We only allocate additional connections once the newly created one is also saturated (or otherwise not eligible).

Test updates

Because the pool now multiplexes more aggressively, two tests needed to be updated:

  • HttpServerTests#doTestGracefulShutdown: HTTP/2 GOAWAY is connection-level, and with multiplexing both concurrent requests can share a single connection, so the test should assert at least one GOAWAY rather than two.
  • Http2PooledConnectionProviderCustomMetricsTest#measureActiveStreamsSize: the test previously used doOnConnected as a proxy for “5 connections”, but with HTTP/2 multiplexing the 5 requests can share one connection with 5 streams. The test now waits for metrics to observe activeStreamSize == 5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/enhancement A general enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Http2Pool: concurrent acquires may allocate extra connections due to async deliver window

2 participants