Skip to content

fetch('https://www.php.net') using official nodejs binaries results in 503 - but 200 on other binaries, in browser, curl, etc. #4516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
amezin opened this issue Dec 16, 2024 · 5 comments
Labels

Comments

@amezin
Copy link

amezin commented Dec 16, 2024

Node.js Version

v22.12.0

NPM Version

10.9.0

Operating System

Linux amezin-laptop.home.arpa 6.12.4-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 09 Dec 2024 14:30:31 +0000 x86_64 GNU/Linux

Subsystem

https

Description

  1. Download https://nodejs.org/dist/v22.12.0/node-v22.12.0-linux-x64.tar.xz and unpack
  2. Run bin/node
  3. await fetch('https://www.php.net')

Result:

Response {
  status: 503,
  statusText: 'Service Temporarily Unavailable',
  headers: Headers {
    server: 'myracloud',
    date: 'Mon, 16 Dec 2024 07:43:14 GMT',
    'content-type': 'text/html',
    'transfer-encoding': 'chunked',
    connection: 'keep-alive',
    'cache-control': 'no-cache, no-store, max-age=0',
    'x-content-type-options': 'nosniff',
    'x-xss-protection': '1; mode=block',
    'x-frame-options': 'SAMEORIGIN'
  },
  body: ReadableStream { locked: false, state: 'readable', supportsBYOB: true },
  bodyUsed: false,
  ok: false,
  redirected: false,
  type: 'basic',
  url: 'https://www.php.net/'
}

When I do the same await fetch('https://www.php.net') in nodejs-lts-iron-20.18.1-1, or nodejs-23.4.0-1, provided by Arch Linux, I get:

Response {
  status: 200,
  statusText: 'OK',
  headers: Headers {
    server: 'myracloud',
    date: 'Mon, 16 Dec 2024 07:43:56 GMT',
    'content-type': 'text/html; charset=utf-8',
    'transfer-encoding': 'chunked',
    connection: 'keep-alive',
    'last-modified': 'Mon, 16 Dec 2024 07:30:09 GMT',
    'content-language': 'en',
    'permissions-policy': 'interest-cohort=()',
    'x-frame-options': 'SAMEORIGIN',
    'set-cookie': 'LAST_NEWS=1734335036; expires=Tue, 16 Dec 2025 07:43:56 GMT; Max-Age=31536000; path=/; domain=.php.net',
    link: '<https://www.php.net/index>; rel=shorturl',
    'content-encoding': 'gzip',
    vary: 'accept-encoding',
    expires: 'Mon, 16 Dec 2024 07:43:56 GMT',
    'cache-control': 'max-age=0'
  },
  body: ReadableStream { locked: false, state: 'readable', supportsBYOB: true },
  bodyUsed: false,
  ok: true,
  redirected: false,
  type: 'basic',
  url: 'https://www.php.net/'
}

Also the page opens successfully in Firefox, curl successfully downloads it.

Why does it happen and can it be fixed?

Minimal Reproduction

No response

Output

No response

Before You Submit

  • I have looked for issues that already exist before submitting this
  • My issue follows the guidelines in the README file, and follows the 'How to ask a good question' guide at https://stackoverflow.com/help/how-to-ask
@preveen-stack
Copy link
Contributor

I was able to reproduce this from raspberry pi 4B for node 22.12 and node 18.19.
But on macos laptop with nodejs 22.7 on the same network it was working fine!

@nodejs/net Can you check

~/pg/nodeissue/node-v22.12.0-linux-arm64 $ ./bin/node 
Welcome to Node.js v22.12.0.
Type ".help" for more information.
> await fetch("https://www.php.net")
Response {
  status: 503,
  statusText: 'Service Temporarily Unavailable',
  headers: Headers {
    server: 'myracloud',
    date: 'Wed, 18 Dec 2024 09:01:47 GMT',
    'content-type': 'text/html',
    'transfer-encoding': 'chunked',
    connection: 'keep-alive',
    'cache-control': 'no-cache, no-store, max-age=0',
    'x-content-type-options': 'nosniff',
    'x-xss-protection': '1; mode=block',
    'x-frame-options': 'SAMEORIGIN'
  },
  body: ReadableStream { locked: false, state: 'readable', supportsBYOB: true },
  bodyUsed: false,
  ok: false,
  redirected: false,
  type: 'basic',
  url: 'https://www.php.net/'
}
> await fetch("https://www.php.net")
Response {
  status: 503,
  statusText: 'Service Temporarily Unavailable',
  headers: Headers {
    server: 'myracloud',
    date: 'Wed, 18 Dec 2024 09:01:57 GMT',
    'content-type': 'text/html',
    'transfer-encoding': 'chunked',
    connection: 'keep-alive',
    'cache-control': 'no-cache, no-store, max-age=0',
    'x-content-type-options': 'nosniff',
    'x-xss-protection': '1; mode=block',
    'x-frame-options': 'SAMEORIGIN'
  },
  body: ReadableStream { locked: false, state: 'readable', supportsBYOB: true },
  bodyUsed: false,
  ok: false,
  redirected: false,
  type: 'basic',
  url: 'https://www.php.net/'
}
> .exit
$ node 
Welcome to Node.js v18.19.0.
Type ".help" for more information.
> await fetch("https://www.php.net")
Response {
  [Symbol(realm)]: null,
  [Symbol(state)]: Proxy [
    {
      aborted: false,
      rangeRequested: false,
      timingAllowPassed: true,
      requestIncludesCredentials: true,
      type: 'default',
      status: 503,
      timingInfo: [Object],
      cacheState: '',
      statusText: 'Service Temporarily Unavailable',
      headersList: [HeadersList],
      urlList: [Array],
      body: [Object]
    },
    { get: [Function: get], set: [Function: set] }
  ],
  [Symbol(headers)]: HeadersList {
    cookies: null,
    [Symbol(headers map)]: Map(9) {
      'server' => [Object],
      'date' => [Object],
      'content-type' => [Object],
      'transfer-encoding' => [Object],
      'connection' => [Object],
      'cache-control' => [Object],
      'x-content-type-options' => [Object],
      'x-xss-protection' => [Object],
      'x-frame-options' => [Object]
    },
    [Symbol(headers map sorted)]: null
  }
}
> await fetch("https://www.php.net")
Response {
  [Symbol(realm)]: null,
  [Symbol(state)]: Proxy [
    {
      aborted: false,
      rangeRequested: false,
      timingAllowPassed: true,
      requestIncludesCredentials: true,
      type: 'default',
      status: 503,
      timingInfo: [Object],
      cacheState: '',
      statusText: 'Service Temporarily Unavailable',
      headersList: [HeadersList],
      urlList: [Array],
      body: [Object]
    },
    { get: [Function: get], set: [Function: set] }
  ],
  [Symbol(headers)]: HeadersList {
    cookies: null,
    [Symbol(headers map)]: Map(9) {
      'server' => [Object],
      'date' => [Object],
      'content-type' => [Object],
      'transfer-encoding' => [Object],
      'connection' => [Object],
      'cache-control' => [Object],
      'x-content-type-options' => [Object],
      'x-xss-protection' => [Object],
      'x-frame-options' => [Object]
    },
    [Symbol(headers map sorted)]: null
  }
}
> .exit
Preveens-MacBook-Pro:preveen$ node
Welcome to Node.js v22.7.0.
Type ".help" for more information.
> await fetch("https://www.php.net")
Response {
  status: 200,
  statusText: 'OK',
  headers: Headers {
    server: 'myracloud',
    date: 'Wed, 18 Dec 2024 09:04:11 GMT',
    'content-type': 'text/html; charset=utf-8',
    'transfer-encoding': 'chunked',
    connection: 'keep-alive',
    'last-modified': 'Wed, 18 Dec 2024 08:50:09 GMT',
    'content-language': 'en',
    'permissions-policy': 'interest-cohort=()',
    'x-frame-options': 'SAMEORIGIN',
    'set-cookie': 'LAST_NEWS=1734512650; expires=Thu, 18 Dec 2025 09:04:11 GMT; Max-Age=31536000; path=/; domain=.php.net',
    link: '<https://www.php.net/index>; rel=shorturl',
    'content-encoding': 'gzip',
    vary: 'accept-encoding',
    expires: 'Wed, 18 Dec 2024 09:04:11 GMT',
    'cache-control': 'max-age=0'
  },
  body: ReadableStream { locked: false, state: 'readable', supportsBYOB: true },
  bodyUsed: false,
  ok: true,
  redirected: false,
  type: 'basic',
  url: 'https://www.php.net/'
}
> 

@pimterry
Copy link
Member

I can reproduce this myself: requests with Node are all rejected, while other tools all work correctly. This happens with fetch and the https module, for Node v12+ (Node 10 works fine), making requests from Linux.

It looks like php.net (or more likely the CDN they're using) now blocks some of Node.js's TLS fingerprints completely. Not all of them (not the fingerprint from Macs, by the sounds of @preveen-stack's tests) but at least requests from Node on Linux. I'd guess they've seen some bot traffic with that signature, and some anti-bot mechanisms somewhere have marked it as suspicious.

Testing with HTTP Toolkit to check: making the request with any recent Node returns 503, making the request via HTTP Toolkit (which proxies the same content with identical header data etc, but does affect the TLS fingerprint) returns a 200. HTTP Toolkit itself is actually built with Node.js, but modifies its own fingerprint to roughly match Firefox here: https://github.com/httptoolkit/mockttp/blob/703fa1f06fdfa91ddc3ba8494c94747c90462c31/src/rules/passthrough-handling.ts#L25-L100

Copying that logic, if I make a request with the equivalent cipher list, it works:

// These are most of the standard ciphers Node.js uses by default, but reordered
// (Please check this list is correct for your case if you're doing anything
// important with this!)
const ciphers = [
  'TLS_AES_128_GCM_SHA256',
  'TLS_CHACHA20_POLY1305_SHA256',
  'TLS_AES_256_GCM_SHA384',
  'ECDHE-ECDSA-AES128-GCM-SHA256',
  'ECDHE-RSA-AES128-GCM-SHA256',
  'ECDHE-ECDSA-CHACHA20-POLY1305',
  'ECDHE-RSA-CHACHA20-POLY1305',
  'ECDHE-ECDSA-AES256-GCM-SHA384',
  'ECDHE-RSA-AES256-GCM-SHA384',
  'ECDHE-ECDSA-AES256-SHA',
  'ECDHE-ECDSA-AES128-SHA',
  'ECDHE-RSA-AES128-SHA',
  'ECDHE-RSA-AES256-SHA',
  'AES128-GCM-SHA256',
  'AES256-GCM-SHA384'
].join(':')

require('https').request('https://www.php.net', { ciphers }, r => console.log('response', r.statusCode)).end()

This prints response 200 while the same line without ciphers prints response 503.

I wrote an article about workarounds to solve TLS fingerprinting issues in general a while back here: https://httptoolkit.com/blog/tls-fingerprinting-node-js/. There's an issue discussing possible solutions in Node at nodejs/node#41112 but no real fix coming imminently there I'm afraid. For now, the easiest way to resolve this will be to tweak your request's TLS config somehow as above.

@amezin
Copy link
Author

amezin commented Jan 7, 2025

Thanks, this explains everything. Also found confirmations in https://github.com/php/web-php/issues that other types of clients get blocked sometimes too.

@1mike12
Copy link

1mike12 commented Jan 31, 2025

for anyone following along at home, php.net no longer blocks whether you re-arrange the ciphers or not sicne I'm pretty sure they turned off any cdn bot protection. There's nothing in the response headers that would point to akamai or cloudflare. At least on my machine, at this specific point in time. The thing is, the blocking techniques used are getting more sophisticated and crude, naive re-arrangement of ciphers is very easy to block.

For instance, no matter what ciphers you limit the request to in node, openssl will add TLS_EMPTY_RENEGOTIATION_INFO_SCSV. This is never used by browsers and it doesn't even take machine learning to pick up on, that this is definitely not a browser. And the CDNs are definitely using machine learning

@pimterry
Copy link
Member

pimterry commented Feb 2, 2025

For instance, no matter what ciphers you limit the request to in node, openssl will add TLS_EMPTY_RENEGOTIATION_INFO_SCSV. This is never used by browsers and it doesn't even take machine learning to pick up on, that this is definitely not a browser. And the CDNs are definitely using machine learning

I fixed that exact issue in OpenSSL a few months ago 😄. OpenSSL 3.4.0+ no longer sends the TLS_EMPTY_RENEGOTIATION_INFO_SCSV for modern TLS: openssl/openssl#24161.

This fingerprinting is problematic, but there's quite a bit of progress that can be made here! I think it's quite plausible to get to a stage in future where you can configure Node with an TLS fingerprint indistinguishable from browsers (although it'll take plenty of time: more changes in OpenSSL, plus time to migrate Node to those versions). There'll always be more tiny fingerprinting differences elsewhere of course, but there's also a limit to how strict most servers can be since they need to avoid e.g. unexpectedly blocking new browser releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants