Skip to content

Performance regression in 1.45.0 #7744

@Lorak-mmk

Description

@Lorak-mmk

Version
1.45.0.
I also tried with 1.48.0, the regression is still present.
1.44.x is fine, 1.43.x as well.

Platform
Linux latte-perf-debug-dev-loader-node-b2e23059-1 6.14.0-1017-aws #17~24.04.1-Ubuntu SMP Wed Nov 5 10:48:17 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Description

Tokio 1.45.0 and up introduced a significant performance regression, which we noticed using (our fork of) Scylla/Cassandra benchmarking tool latte.

Results for latte 0.32.0 with tokio 1.44.2
ubuntu@latte-perf-debug-dev-loader-node-b2e23059-1:~/latte_karol$ ./latte_0.32.0-tokio1.44.2 run ~/latte/simple.rn 10.0.10.196 -P keyspace="\"lattetablets\"" -P tablets=true -q -P row_count=50100100 -d=30s --consistency=LOCAL_QUORUM --threads=60 --concurrency=64 --connections=1 --s>
info: Loading workload script /home/ubuntu/latte/simple.rn...
info: Connecting to ["10.0.10.196"]... 
info: Connected to '' cluster running ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee
info: Preparing...
info: Warming up...
info: Running benchmark...
CONFIG ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
            Date            Tue, 25 Nov 2025                                                          
            Time            14:49:48 +0000                                                            
         Cluster                                                                                      
      Datacenter                                                                                      
            Rack                                                                                      
      DB version            ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee                                            
        Workload            simple.rn                                                                 
     Function(s)            write:1, read:4                                                           
     Consistency            LocalQuorum                                                               
            Tags                                                                                      
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     -P keyspace            "lattetablets"                                                            
    -P row_count            50100100                                                                  
      -P tablets            true                                                                      
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         Threads                   60                                                                 
     Connections                    1                                                                 
     Concurrency     [req]         64                                                                 
        Max rate    [op/s]                                                                            
          Warmup       [s]                                                                            
              └─      [op]          1                                                                 
        Run time       [s]       30.0                                                                 
              └─      [op]                                                                            
        Sampling       [s]        5.0                                                                 
              └─      [op]                                                                            
 Request timeout       [s]          7                                                                 
         Retries                                                                                      
  ┌──────┴number                   10                                                                 
  ├─min interval      [ms]       1000                                                                 
  └─max interval      [ms]       5000                                                                 

LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Time    Cycles    Errors    Thrpt.     ────────────────────────────────── Latency [ms/op] ──────────────────────────────
     [s]      [op]      [op]    [op/s]             Min        50        75        90        95        99      99.9       Max
   5.002   7464708         0   1492498           0.216     2.232     3.049     3.916     4.608     8.593    34.898    65.831
  10.000   7576099         0   1515262           0.235     2.265     3.056     3.863     4.456     7.115    30.654    79.430
  15.002   7612968         0   1522633           0.225     2.241     3.031     3.836     4.424     7.234    32.784    77.529
  20.007   7605142         0   1520996           0.205     2.243     3.025     3.811     4.383     7.205    34.111    73.400
  25.000   7538566         0   1507792           0.217     2.245     3.049     3.883     4.510     7.578    33.047    81.920
  30.003   7524431         0   1504956           0.220     2.267     3.074     3.906     4.522     7.352    32.244    72.417
  30.003       280         0    124601           0.838    15.254    24.396    28.377    31.162    33.161    34.734    34.734

SUMMARY STATS ══════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Elapsed time       [s]     30.029                                                                 
        CPU time       [s]   1293.994                                                                 
 CPU utilisation       [%]       67.3                                                                 
          Cycles      [op]   45322194                                                                 
          Errors      [op]          0                                                                 
              └─       [%]        0.0                                                                 
        Requests     [req]   45322194                                                                 
              └─  [req/op]        1.0                                                                 
         Retries     [ret]          0                                                                 
              └─ [ret/req]    0.00000                                                                 
            Rows     [row]          0                                                                 
              └─ [row/req]        0.0                                                                 
     Concurrency     [req]         59 ± 1                                                             
              └─       [%]         92                                                                 
      Throughput    [op/s]    1509287 ± 31023                                                         
              ├─   [req/s]    1509287 ± 31023                                                         
              └─   [row/s]          0 ± 0                                                             
   Cycle latency      [ms]      2.541 ± 0.002                                                         
 Request latency      [ms]      2.388 ± 0.002                                                         

CYCLE LATENCY for write [ms]  ══════════════════════════════════════════════════════════════════════════════════════════════
          Min                   0.225 ± 0.032                                                         
           25                   1.542 ± 0.002                                                         
           50                   2.165 ± 0.003                                                         
           75                   2.943 ± 0.003                                                         
           90                   3.770 ± 0.004                                                         
           95                   4.387 ± 0.007                                                         
           98                   5.718 ± 0.020                                                         
           99                   7.328 ± 0.050                                                         
           99.9                31.261 ± 0.517                                                         
           99.99               51.577 ± 1.054                                                         
          Max                  77.857 ± 1.137                                                         

CYCLE LATENCY for read [ms]  ═══════════════════════════════════════════════════════════════════════════════════════════════
          Min                   0.205 ± 0.021                                                         
           25                   1.612 ± 0.002                                                         
           50                   2.271 ± 0.002                                                         
           75                   3.072 ± 0.002                                                         
           90                   3.891 ± 0.003                                                         
           95                   4.506 ± 0.005                                                         
           98                   5.890 ± 0.013                                                         
           99                   7.471 ± 0.028                                                         
           99.9                33.473 ± 0.267                                                         
           99.99               54.559 ± 0.694                                                         
          Max                  81.920 ± 0.932                                                         
Results for latte 0.32.0 with tokio 1.45.0
ubuntu@latte-perf-debug-dev-loader-node-b2e23059-1:~/latte_karol$ ./latte_0.32.0-tokio1.45.0 run ~/latte/simple.rn 10.0.10.196 -P keyspace="\"lattetablets\"" -P tablets=true -q -P row_count=50100100 -d=30s --consistency=LOCAL_QUORUM --threads=60 --concurrency=64 --connections=1 --s>
info: Loading workload script /home/ubuntu/latte/simple.rn...
info: Connecting to ["10.0.10.196"]... 
info: Connected to '' cluster running ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee
info: Preparing...
info: Warming up...
info: Running benchmark...
CONFIG ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
            Date            Tue, 25 Nov 2025                                                          
            Time            14:45:38 +0000                                                            
         Cluster                                                                                      
      Datacenter                                                                                      
            Rack                                                                                      
      DB version            ScyllaDB 2025.3.3-0.20251029.0e6381f14db2 with build-id 05d55ab705a3d90b467774912f5eaa5ff495f1ee                                            
        Workload            simple.rn                                                                 
     Function(s)            write:1, read:4                                                           
     Consistency            LocalQuorum                                                               
            Tags                                                                                      
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     -P keyspace            "lattetablets"                                                            
    -P row_count            50100100                                                                  
      -P tablets            true                                                                      
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         Threads                   60                                                                 
     Connections                    1                                                                 
     Concurrency     [req]         64                                                                 
        Max rate    [op/s]                                                                            
          Warmup       [s]                                                                            
              └─      [op]          1                                                                 
        Run time       [s]       30.0                                                                 
              └─      [op]                                                                            
        Sampling       [s]        5.0                                                                 
              └─      [op]                                                                            
 Request timeout       [s]          7                                                                 
         Retries                                                                                      
  ┌──────┴number                   10                                                                 
  ├─min interval      [ms]       1000                                                                 
  └─max interval      [ms]       5000                                                                 

LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Time    Cycles    Errors    Thrpt.     ────────────────────────────────── Latency [ms/op] ──────────────────────────────
     [s]      [op]      [op]    [op/s]             Min        50        75        90        95        99      99.9       Max
   5.006   1830326         0    365656           0.195     6.795    13.697    23.593    31.097    48.792    72.155   116.523
  10.001   1838507         0    367722           0.192     6.812    13.640    23.527    30.999    47.809    69.730    96.928
  15.010   1838173         0    367577           0.220     6.705    13.476    23.396    31.179    50.102    79.626   145.228
  20.009   1837602         0    367612           0.211     6.722    13.541    23.495    31.244    49.545    74.973   114.885
^C  21.611    588675         0    367244           0.240     6.857    13.615    23.314    30.720    48.267    71.369    88.015

SUMMARY STATS ══════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Elapsed time       [s]     21.633                                                                 
        CPU time       [s]    226.948                                                                 
 CPU utilisation       [%]       16.4                                                                 
          Cycles      [op]    7933283                                                                 
          Errors      [op]          0                                                                 
              └─       [%]        0.0                                                                 
        Requests     [req]    7933283                                                                 
              └─  [req/op]        1.0                                                                 
         Retries     [ret]          0                                                                 
              └─ [ret/req]    0.00000                                                                 
            Rows     [row]          0                                                                 
              └─ [row/req]        0.0                                                                 
     Concurrency     [req]         59 ± 0                                                             
              └─       [%]         91                                                                 
      Throughput    [op/s]     366727 ± 3603                                                          
              ├─   [req/s]     366727 ± 3603                                                          
              └─   [row/s]          0 ± 0                                                             
   Cycle latency      [ms]     10.452 ± 0.017                                                         
 Request latency      [ms]      9.753 ± 0.018                                                         

CYCLE LATENCY for write [ms]  ══════════════════════════════════════════════════════════════════════════════════════════════
          Min                   0.192 ± 0.054                                                         
           25                   3.803 ± 0.012                                                         
           50                   6.730 ± 0.025                                                         
           75                  13.451 ± 0.051                                                         
           90                  23.249 ± 0.096                                                         
           95                  30.835 ± 0.169                                                         
           98                  41.255 ± 0.220                                                         
           99                  48.660 ± 0.301                                                         
           99.9                73.269 ± 0.916                                                         
           99.99               97.124 ± 3.592                                                         
          Max                 142.606 ± 2.922                                                         

CYCLE LATENCY for read [ms]  ═══════════════════════════════════════════════════════════════════════════════════════════════
          Min                   0.195 ± 0.018                                                         
           25                   3.799 ± 0.008                                                         
           50                   6.775 ± 0.016                                                         
           75                  13.623 ± 0.035                                                         
           90                  23.544 ± 0.061                                                         
           95                  31.162 ± 0.089                                                         
           98                  41.484 ± 0.149                                                         
           99                  49.054 ± 0.213                                                         
           99.9                73.990 ± 0.639                                                         
           99.99               98.042 ± 1.931                                                         
          Max                 145.228 ± 3.017                                       

The interesting part is the throughput. For tokio 1.44.2, we can do about 1.5M ops/s. For 1.45.0, it is only about 367k ops/s. 75% reduction in performance!

I generated flamegraphs for good and bad run.
Bad:

Image

Good:

Image

Basically the only difference is a bigger presence of tokio::time::sleep - 20% vs 11.5%.
The only sleep-related change I see in 1.45 is #7226 , but I don't know Tokio well enough to conclusively say this is the cause.

Is that a known issue? Can I somehow help with debugging / fixing it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-tokioArea: The main tokio crateM-timeModule: tokio/timeT-performanceTopic: performance and benchmarks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions