-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Version
1.45.0.
I also tried with 1.48.0, the regression is still present.
1.44.x is fine, 1.43.x as well.
Platform
Linux latte-perf-debug-dev-loader-node-b2e23059-1 6.14.0-1017-aws #17~24.04.1-Ubuntu SMP Wed Nov 5 10:48:17 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Description
Tokio 1.45.0 and up introduced a significant performance regression, which we noticed using (our fork of) Scylla/Cassandra benchmarking tool latte.
Results for latte 0.32.0 with tokio 1.44.2
ubuntu@latte-perf-debug-dev-loader-node-b2e23059-1:~/latte_karol$ ./latte_0.32.0-tokio1.44.2 run ~/latte/simple.rn 10.0.10.196 -P keyspace="\"lattetablets\"" -P tablets=true -q -P row_count=50100100 -d=30s --consistency=LOCAL_QUORUM --threads=60 --concurrency=64 --connections=1 --s>
info: Loading workload script /home/ubuntu/latte/simple.rn...
info: Connecting to ["10.0.10.196"]...
info: Connected to '' cluster running ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee
info: Preparing...
info: Warming up...
info: Running benchmark...
CONFIG ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Date Tue, 25 Nov 2025
Time 14:49:48 +0000
Cluster
Datacenter
Rack
DB version ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee
Workload simple.rn
Function(s) write:1, read:4
Consistency LocalQuorum
Tags
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-P keyspace "lattetablets"
-P row_count 50100100
-P tablets true
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Threads 60
Connections 1
Concurrency [req] 64
Max rate [op/s]
Warmup [s]
└─ [op] 1
Run time [s] 30.0
└─ [op]
Sampling [s] 5.0
└─ [op]
Request timeout [s] 7
Retries
┌──────┴number 10
├─min interval [ms] 1000
└─max interval [ms] 5000
LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Time Cycles Errors Thrpt. ────────────────────────────────── Latency [ms/op] ──────────────────────────────
[s] [op] [op] [op/s] Min 50 75 90 95 99 99.9 Max
5.002 7464708 0 1492498 0.216 2.232 3.049 3.916 4.608 8.593 34.898 65.831
10.000 7576099 0 1515262 0.235 2.265 3.056 3.863 4.456 7.115 30.654 79.430
15.002 7612968 0 1522633 0.225 2.241 3.031 3.836 4.424 7.234 32.784 77.529
20.007 7605142 0 1520996 0.205 2.243 3.025 3.811 4.383 7.205 34.111 73.400
25.000 7538566 0 1507792 0.217 2.245 3.049 3.883 4.510 7.578 33.047 81.920
30.003 7524431 0 1504956 0.220 2.267 3.074 3.906 4.522 7.352 32.244 72.417
30.003 280 0 124601 0.838 15.254 24.396 28.377 31.162 33.161 34.734 34.734
SUMMARY STATS ══════════════════════════════════════════════════════════════════════════════════════════════════════════════
Elapsed time [s] 30.029
CPU time [s] 1293.994
CPU utilisation [%] 67.3
Cycles [op] 45322194
Errors [op] 0
└─ [%] 0.0
Requests [req] 45322194
└─ [req/op] 1.0
Retries [ret] 0
└─ [ret/req] 0.00000
Rows [row] 0
└─ [row/req] 0.0
Concurrency [req] 59 ± 1
└─ [%] 92
Throughput [op/s] 1509287 ± 31023
├─ [req/s] 1509287 ± 31023
└─ [row/s] 0 ± 0
Cycle latency [ms] 2.541 ± 0.002
Request latency [ms] 2.388 ± 0.002
CYCLE LATENCY for write [ms] ══════════════════════════════════════════════════════════════════════════════════════════════
Min 0.225 ± 0.032
25 1.542 ± 0.002
50 2.165 ± 0.003
75 2.943 ± 0.003
90 3.770 ± 0.004
95 4.387 ± 0.007
98 5.718 ± 0.020
99 7.328 ± 0.050
99.9 31.261 ± 0.517
99.99 51.577 ± 1.054
Max 77.857 ± 1.137
CYCLE LATENCY for read [ms] ═══════════════════════════════════════════════════════════════════════════════════════════════
Min 0.205 ± 0.021
25 1.612 ± 0.002
50 2.271 ± 0.002
75 3.072 ± 0.002
90 3.891 ± 0.003
95 4.506 ± 0.005
98 5.890 ± 0.013
99 7.471 ± 0.028
99.9 33.473 ± 0.267
99.99 54.559 ± 0.694
Max 81.920 ± 0.932
Results for latte 0.32.0 with tokio 1.45.0
ubuntu@latte-perf-debug-dev-loader-node-b2e23059-1:~/latte_karol$ ./latte_0.32.0-tokio1.45.0 run ~/latte/simple.rn 10.0.10.196 -P keyspace="\"lattetablets\"" -P tablets=true -q -P row_count=50100100 -d=30s --consistency=LOCAL_QUORUM --threads=60 --concurrency=64 --connections=1 --s>
info: Loading workload script /home/ubuntu/latte/simple.rn...
info: Connecting to ["10.0.10.196"]...
info: Connected to '' cluster running ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee
info: Preparing...
info: Warming up...
info: Running benchmark...
CONFIG ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Date Tue, 25 Nov 2025
Time 14:45:38 +0000
Cluster
Datacenter
Rack
DB version ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee
Workload simple.rn
Function(s) write:1, read:4
Consistency LocalQuorum
Tags
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-P keyspace "lattetablets"
-P row_count 50100100
-P tablets true
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Threads 60
Connections 1
Concurrency [req] 64
Max rate [op/s]
Warmup [s]
└─ [op] 1
Run time [s] 30.0
└─ [op]
Sampling [s] 5.0
└─ [op]
Request timeout [s] 7
Retries
┌──────┴number 10
├─min interval [ms] 1000
└─max interval [ms] 5000
LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Time Cycles Errors Thrpt. ────────────────────────────────── Latency [ms/op] ──────────────────────────────
[s] [op] [op] [op/s] Min 50 75 90 95 99 99.9 Max
5.006 1830326 0 365656 0.195 6.795 13.697 23.593 31.097 48.792 72.155 116.523
10.001 1838507 0 367722 0.192 6.812 13.640 23.527 30.999 47.809 69.730 96.928
15.010 1838173 0 367577 0.220 6.705 13.476 23.396 31.179 50.102 79.626 145.228
20.009 1837602 0 367612 0.211 6.722 13.541 23.495 31.244 49.545 74.973 114.885
^C 21.611 588675 0 367244 0.240 6.857 13.615 23.314 30.720 48.267 71.369 88.015
SUMMARY STATS ══════════════════════════════════════════════════════════════════════════════════════════════════════════════
Elapsed time [s] 21.633
CPU time [s] 226.948
CPU utilisation [%] 16.4
Cycles [op] 7933283
Errors [op] 0
└─ [%] 0.0
Requests [req] 7933283
└─ [req/op] 1.0
Retries [ret] 0
└─ [ret/req] 0.00000
Rows [row] 0
└─ [row/req] 0.0
Concurrency [req] 59 ± 0
└─ [%] 91
Throughput [op/s] 366727 ± 3603
├─ [req/s] 366727 ± 3603
└─ [row/s] 0 ± 0
Cycle latency [ms] 10.452 ± 0.017
Request latency [ms] 9.753 ± 0.018
CYCLE LATENCY for write [ms] ══════════════════════════════════════════════════════════════════════════════════════════════
Min 0.192 ± 0.054
25 3.803 ± 0.012
50 6.730 ± 0.025
75 13.451 ± 0.051
90 23.249 ± 0.096
95 30.835 ± 0.169
98 41.255 ± 0.220
99 48.660 ± 0.301
99.9 73.269 ± 0.916
99.99 97.124 ± 3.592
Max 142.606 ± 2.922
CYCLE LATENCY for read [ms] ═══════════════════════════════════════════════════════════════════════════════════════════════
Min 0.195 ± 0.018
25 3.799 ± 0.008
50 6.775 ± 0.016
75 13.623 ± 0.035
90 23.544 ± 0.061
95 31.162 ± 0.089
98 41.484 ± 0.149
99 49.054 ± 0.213
99.9 73.990 ± 0.639
99.99 98.042 ± 1.931
Max 145.228 ± 3.017
The interesting part is the throughput. For tokio 1.44.2, we can do about 1.5M ops/s. For 1.45.0, it is only about 367k ops/s. 75% reduction in performance!
I generated flamegraphs for good and bad run.
Bad:
Good:
Basically the only difference is a bigger presence of tokio::time::sleep - 20% vs 11.5%.
The only sleep-related change I see in 1.45 is #7226 , but I don't know Tokio well enough to conclusively say this is the cause.
Is that a known issue? Can I somehow help with debugging / fixing it?