You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable CPU compression offload for EB mode in Thrift server
Summary:
D97565705 moved Thrift server response compression from IO threads to CPU threads, but only when executor_ is non-null. In EB mode, executor_ is null (the defaultSync resource pool has no executor), so compression still runs on the IO thread — defeating the purpose.
This diff fixes the EB mode gap by computing the compression executor as a local variable in sendReply() with a fallback chain: executor_ (TM mode) → server's handler executor via the context chain (EB mode) → folly::getGlobalCPUExecutor() (safety net). The dispatch methods now accept the executor as a parameter instead of reading executor_ directly.
## Key properties:
- No new members on any object, no new virtual methods
- executor_ is never mutated — EB method semantics are unchanged
- The fallback chain (~4ns) only runs when: flag is on, payload exceeds the compression threshold, and we're on the IO thread in EB mode — negligible compared to the compression work it enables (us–ms)
- Gated by the existing thrift_server_compress_response_on_cpu flag
## Benchmark Results: CPU Compression Offload (Echo32k_semi_random_eb)
| Metric | Baseline | Server Offload | Change |
|---|---|---|---|
| Average QPS | 27,824 | 50,614 | +82% |
| p50 Latency (ms) | 5.352 | 3.153 | -41% |
| p99 Latency (ms) | 10.275 | 4.196 | -59% |
| p100 Latency (ms) | 26.030 | 11.664 | -55% |
| Server CPU Utilization | 2.27 | 4.89 | +115% |
| Client CPU Utilization | 2.37 | 4.04 | +71% |
### Summary
Offloading compression to CPU threads delivers an **82% QPS improvement** and cuts **p99 latency by 59%** for semi-compressible 32KB payloads on EB-thread handlers. The trade-off is higher CPU utilization (+115% server-side), which is expected — the IO threads are no longer blocked by compression and can accept requests faster, driving more total throughput. The CPU threads absorb the compression work in parallel, converting idle CPU capacity into lower latency and higher throughput.
### Limitations
- **IO thread saturation required.** The feature only helps when IO threads are the bottleneck. If IO threads have spare capacity, inline compression is fast enough and the dispatch overhead provides no benefit.
- **Thread-hop cost.** Each dispatched response pays a fixed overhead for executor queue insertion, CPU thread dequeue, reply queue notification (eventfd syscall), and IO thread wakeup. This fixed cost is independent of payload size, so it becomes proportionally less significant for larger payloads.
- **Minimum payload size.** Payloads must be large enough that compression time significantly exceeds the thread-hop overhead. A minimum threshold of 1KB (`thrift_server_min_cpu_compression_payload_size`) is enforced to prevent small responses (e.g., pings) from being dispatched at a net loss.
- **Data compressibility matters.** The feature benefits semi-compressible data (structured Thrift responses, JSON-like content) where compression is both CPU-expensive and effective at reducing wire size. Trivially compressible data (repeated bytes) compresses too fast to justify the hop. Incompressible data (random bytes) gains nothing from compression and bottlenecks on network IO instead.
### Two-threshold interaction
There are two independent size thresholds that gate compression behavior. They serve different purposes and do not conflict:
- `compressionSizeLimit` (existing, per-connection) — configured via the client's compression config (compressionConfig_.compressionSizeLimit()). Controls whether compression happens at all. Payloads at or below this limit skip compression entirely (no algorithm is selected). This threshold is unchanged by this diff.
- `thrift_server_min_cpu_compression_payload_size` (new, global flag, default 1024) — controls where compression runs (CPU thread vs inline on IO thread). Payloads below this threshold still get compressed, but inline on the current thread rather than being dispatched to a CPU executor. This avoids the thread-hop overhead for small payloads where inline compression is cheaper than the dispatch cost.
Evaluation order in `shouldDispatchCompressionToCpu(payloadSize)`:
- If payloadSize < `thrift_server_min_cpu_compression_payload_size` → compress inline (no dispatch)
- If `getEligibleCompressionAlgorithm(payloadSize)` returns nullopt (no algorithm, or payload ≤ `compressionSizeLimit`) → no compression at all
- Otherwise → dispatch compression to CPU thread
This does not change existing behavior. Both thresholds are only evaluated when `thrift_server_compress_response_on_cpu` is true (default false). Services that have not opted in see zero behavior change. For services that have opted in, the new minimum size threshold adds a small-payload bypass that wasn't previously needed (because the prior code only dispatched when executor_ was non-null, which excluded EB mode entirely).
Reviewed By: robertroeser
Differential Revision: D100902596
fbshipit-source-id: 583199eb8d05d14af0a5119a7cf72a17736b91c3
0 commit comments