Perfomance regression node 16.20.0 -> 18.16.0 #95

LuisFros · 2023-06-07T13:39:17Z

Running a simple HTTP graphql server and following the guide https://nodejs.org/en/docs/guides/simple-profiling

What factors or changes could be taken into consideration to account for this perfomance regression?
Internal profilings show no significant difference and its mostly notable on request/response times

------------- NODE 16----------------

This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requests


Server Software:        
Server Hostname:        localhost
Server Port:            3100

Document Path:          /graphql
Document Length:        34 bytes

Concurrency Level:      20
Time taken for tests:   3.203 seconds
Complete requests:      250
Failed requests:        0
Keep-Alive requests:    250
Total transferred:      75000 bytes
Total body sent:        367750
HTML transferred:       8500 bytes
Requests per second:    78.06 [#/sec] (mean)
Time per request:       256.227 [ms] (mean)
Time per request:       12.811 [ms] (mean, across all concurrent requests)
Transfer rate:          22.87 [Kbytes/sec] received
                        112.13 kb/s sent
                        135.00 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:   186  224  60.2    208     451
Waiting:      186  224  60.2    208     451
Total:        186  224  60.4    208     452

Percentage of the requests served within a certain time (ms)
  50%    208
  66%    213
  75%    215
  80%    217
  90%    224
  95%    424
  98%    440
  99%    447
 100%    452 (longest request)

-------------NODE 18----------------

This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requests


Server Software:        
Server Hostname:        localhost
Server Port:            3100

Document Path:          /graphql
Document Length:        34 bytes

Concurrency Level:      20
Time taken for tests:   29.894 seconds
Complete requests:      250
Failed requests:        0
Keep-Alive requests:    250
Total transferred:      75000 bytes
Total body sent:        367750
HTML transferred:       8500 bytes
Requests per second:    8.36 [#/sec] (mean)
Time per request:       2391.480 [ms] (mean)
Time per request:       119.574 [ms] (mean, across all concurrent requests)
Transfer rate:          2.45 [Kbytes/sec] received
                        12.01 kb/s sent
                        14.46 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:   191  345 1682.1    219   26814
Waiting:      191  345 1682.0    219   26813
Total:        191  345 1682.1    219   26814

Percentage of the requests served within a certain time (ms)
  50%    219
  66%    226
  75%    229
  80%    235
  90%    257
  95%    470
  98%    473
  99%    473
 100%  26814 (longest request)

The text was updated successfully, but these errors were encountered:

aduh95 · 2023-06-07T16:36:24Z

Duplicate of #79 and/or #72?

H4ad · 2023-06-07T18:04:34Z

@LuisFros Can you share some code on Stackblitz or some repo?

Also, are you using the crypto module?

LuisFros · 2023-06-09T08:35:10Z

I am using the crypto module and already handled the performance issue with #72
I can't share some code unfortunately but I would like to know how could I provide some more information without the code itself?

NewEraCracker · 2023-06-09T10:57:43Z

For your specific usage scenario, it appears that Node.js 18 exhibits a significant performance regression compared to Node.js 16, with a decrease of approximately up to 88% (less requests being processed in the same time frame).

A proof of concept would be useful, nonetheless.

Could you perhaps try with a custom build of Node.js 18 custom linked to shared OpenSSL 1.1.1:

Perf regression on crypto.verify #72 (comment)

And run your benchmark again to compare the results?

LuisFros · 2023-06-12T15:10:13Z

I haven't tried running the custom build with OpenSSL 1.1.1 since the profiling doesn't show that the perfomance regression is caused by the crypto module anymore.

Unrelated to my issue, I managed to create a reproducible case where there is a significant perfomance regression of a function call being passed down as an argument and then called.
This is the code example https://stackblitz.com/edit/stackblitz-starters-sb65sg?file=index.js
When running the code in index.js in node 18.16.0 the perfomance is ~3x slower than in 16.20 for the second benchmarking case called 'Nested call'

EDIT: I forgot to mention that the regression is gone in node 20

LuisFros · 2023-06-16T11:58:18Z

I created a sample case for the issue of a simple http server and testing different response times and node version:

Code:

const http = require('http')

const host = 'localhost'
const port = parseInt(process.argv[2]) || 3000

const server = http.createServer((req, res) => {
  if (req.url === '/') {
    res.writeHead(200, { 'Content-Type': 'text/plain' })
    res.end(`Use ${host}:${port}/test`)
  } else if (req.url === '/test') {
    res.writeHead(200, { 'Content-Type': 'text/plain' })
    res.end('test')
  } else {
    res.writeHead(404, { 'Content-Type': 'text/plain' })
    res.end('Not found')
  }
})

server.listen(port, host, () => {
  console.log(`Server is running on http://${host}:${port}`)
})

The tests were run using

ab -k -c $concurrency -n $count "http://127.0.0.1:$port/test"

The label on the axis is {$concurrency}-{$requests} and the y-axis is requests/sec from the ab output

It is clear that at a high number of concurrency requests there is a significant performance regression of a simple http server.

The difference for 16.20.0 and 18.16.0 is around ~5% slower when loading the requests with 150 concurrency and over 200K requests

RafaelGSS · 2023-06-20T02:33:26Z

@LuisFros The regression between 18 and 20 is quite different from the ones I measured some time ago (I didn't publish it anywhere yet). Would you mind sharing the hardware information? Was it a dedicated machine?

LuisFros · 2023-06-20T08:43:52Z

@RafaelGSS
Yeah it was a dedicated machine:

Ubuntu, 22.04 LTS
Intel Broadwell
x86/64
4vCPU 16GB Memory

LuisFros · 2023-09-04T06:14:36Z

Hi, are there any updates on this from the performance team?

LuisFros · 2023-09-08T06:18:53Z

UPDATE: Ran the tests now again with the latest version of node 20: 20.6.0, node 18 :18.17.1 and node 16: 16.20.2 and more combinations of {$concurrency}-{$requests}

The results still show a performance regression and node 20 has given worse results

santigimeno · 2023-09-10T19:56:11Z

I've been looking into the v20 regression.

The issue seems to be caused by this libuv commit which makes libuv to stop accepting concurrent connections in a loop and defers every next incoming connection handling to the next loop iteration. Reverting it, makes v20 to be on par with v18.

As explained in the commit, the main reason for this change is that in most of the scenarios we save an extra accept() call. Are we sure the scenario that this benchmark is showing is representative of a real-life scenario or is more an academic thing? Just asking so we might consider reverting that commit.

/cc @bnoordhuis, what are your thoughts?

bnoordhuis · 2023-09-11T18:33:44Z

I'm willing to believe it impacts a single-process hello world server but I'm ambivalent on whether that's something to optimize for.

I'd be interested to see how the cluster module fares with NODE_CLUSTER_SCHED_POLICY=none. A big reason for removing the accept loop is because it results in lopsided load distribution.

Node works around that by doing round robin distribution but I don't want to force other libuv users into working around libuv's shortcomings.

mcollina · 2023-09-12T06:26:50Z

My experience with ab is that it open way more sockets than needed, stressing that parts.

bnoordhuis · 2023-09-12T14:59:01Z

That's true. ab is kind of horrible, wrk is better in just about every aspect.

LuisFros · 2023-09-14T12:11:05Z

My experience with ab is that it open way more sockets than needed, stressing that parts.

Do you think it would be worth replicating the tests with another tool? Is wrk the recommended way to test this?

RafaelGSS · 2023-09-19T14:02:07Z

My experience with ab is that it open way more sockets than needed, stressing that parts.

Do you think it would be worth replicating the tests with another tool? Is wrk the recommended way to test this?

I suggest wrk2.

violetviolinist · 2024-11-05T23:48:01Z

Hello, are there any updates on this? I think I am seeing a similar degradation while moving from v14 to v20.18.0. Does v22 address this in any capacity?

lemire · 2024-11-06T00:45:36Z

@violetviolinist Reading the comments suggests that people are unclear as to whether it is an actual meaningful regression.

NewEraCracker · 2024-11-08T18:20:52Z

On my application I do not have any noticeable impact from this issue. Could this be connected to the fact the Node.js 18 prefers IPv6 by default?

https://x.com/matteocollina/status/1640384245834055680

@matteocollina In Node.js v18 we removed an old-time fix that reordered the DNS results to put IPv4 addresses before IPv6. As IPv6 gets widespread usage, this is the right call to do.

This has exposed so many broken IPv6 networks. Work around it with:

node --dns-result-order=ipv4first

I've actually updated the services I could to version 22 and it is quite good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perfomance regression node 16.20.0 -> 18.16.0 #95

Perfomance regression node 16.20.0 -> 18.16.0 #95

LuisFros commented Jun 7, 2023

aduh95 commented Jun 7, 2023

H4ad commented Jun 7, 2023

LuisFros commented Jun 9, 2023 •

edited

Loading

NewEraCracker commented Jun 9, 2023

LuisFros commented Jun 12, 2023 •

edited

Loading

LuisFros commented Jun 16, 2023 •

edited

Loading

RafaelGSS commented Jun 20, 2023

LuisFros commented Jun 20, 2023

LuisFros commented Sep 4, 2023

LuisFros commented Sep 8, 2023 •

edited

Loading

santigimeno commented Sep 10, 2023

bnoordhuis commented Sep 11, 2023

mcollina commented Sep 12, 2023

bnoordhuis commented Sep 12, 2023

LuisFros commented Sep 14, 2023

RafaelGSS commented Sep 19, 2023

violetviolinist commented Nov 5, 2024

lemire commented Nov 6, 2024

NewEraCracker commented Nov 8, 2024

Perfomance regression node 16.20.0 -> 18.16.0 #95

Perfomance regression node 16.20.0 -> 18.16.0 #95

Comments

LuisFros commented Jun 7, 2023

aduh95 commented Jun 7, 2023

H4ad commented Jun 7, 2023

LuisFros commented Jun 9, 2023 • edited Loading

NewEraCracker commented Jun 9, 2023

LuisFros commented Jun 12, 2023 • edited Loading

LuisFros commented Jun 16, 2023 • edited Loading

RafaelGSS commented Jun 20, 2023

LuisFros commented Jun 20, 2023

LuisFros commented Sep 4, 2023

LuisFros commented Sep 8, 2023 • edited Loading

santigimeno commented Sep 10, 2023

bnoordhuis commented Sep 11, 2023

mcollina commented Sep 12, 2023

bnoordhuis commented Sep 12, 2023

LuisFros commented Sep 14, 2023

RafaelGSS commented Sep 19, 2023

violetviolinist commented Nov 5, 2024

lemire commented Nov 6, 2024

NewEraCracker commented Nov 8, 2024

LuisFros commented Jun 9, 2023 •

edited

Loading

LuisFros commented Jun 12, 2023 •

edited

Loading

LuisFros commented Jun 16, 2023 •

edited

Loading

LuisFros commented Sep 8, 2023 •

edited

Loading