Skip to content

Conversation

@Mark-Simulacrum
Copy link
Collaborator

Release Summary:

  • opt(s2n-quic-dc): skip epoll registration in happy path of TCP acceptor

Resolved issues:

n/a

Description of changes:

Currently, dcQUIC streams over TCP will be accepted, be registered with epoll, attempt reading (usually fails), in <1ms the first data packet arrives and we succeed reading, deregister the socket, and then hand off the stream to the application for further reading.

We'd like to avoid the epoll registration as it uses extra CPU (even if latency impact is minimal) so this patch uses the Linux-only TCP_DEFER_ACCEPT to only accept sockets with data already available. That's combined with lazy registration of sockets with Tokio's epoll by only doing so if we get WouldBlock after attempting a read or write. The net effect is that based on flamegraphs epoll registration in the acceptor isn't visible anymore.

The net effect is a 8.8% (relative) drop in overall CPU usage in one of our internal benchmarks which exercises short streams over loopback, bringing CPU usage in the acceptor from 23% of the workload to 18%.

Call-outs:

n/a

Testing:

The new code is semantically a no-op and is exercised by existing tests (see the updated timeout/sleep).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Currently, dcQUIC streams over TCP will be accepted, be registered with
epoll, attempt reading (usually fails), in <1ms the first data packet
arrives and we succeed reading, deregister the socket, and then hand off
the stream to the application for further reading.

We'd like to avoid the epoll registration as it uses extra CPU (even if
latency impact is minimal) so this patch uses the Linux-only
TCP_DEFER_ACCEPT to only accept sockets with data already available.
That's combined with lazy registration of sockets with Tokio's epoll by
only doing so if we get WouldBlock after attempting a read or write.

The net effect is a 8.8% (relative) drop in overall CPU usage in one of
our internal benchmarks which exercises short streams over loopback,
bringing CPU usage in the acceptor from 23% of the workload to 18%.
@Mark-Simulacrum Mark-Simulacrum marked this pull request as ready for review August 14, 2025 12:28
@Mark-Simulacrum Mark-Simulacrum merged commit ff81604 into aws:main Aug 14, 2025
120 checks passed
@Mark-Simulacrum Mark-Simulacrum deleted the defer-accept branch August 14, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants