Skip to content

Conversation

michaelsmithxyz
Copy link

Fixes #59733

This PR addresses an increase in heap usage in Node versions containing the change introduced in #59086. That change added a cache to getNearestParentPackageJSON in package_reader_json to avoid the overhead of repeatedly calling in to C++ when this function is invoked repeatedly on the same file path. There was a similar cache in the original implementation that existed prior to a change (#50322) which moved this package.json reading to C++. However, the original cache was a map from package.json path to a deserialized object, whereas the new cache is a map from any given module file path to an object representing its parent package.json file. Because it's common for multiple files to share a parent package.json, this cache often contains many more entries than original cache, with many of them pointing to duplicate data.

I've addressed the issue here by moving the filesystem traversal which finds the parent package.json file for a given file back to the JS side of the fence, and adjusting the cache to be a map from package.json file path to its deserialized object. Doing this makes the cache size manageable again. The traversal code is basically lifted from the original JS-side implementation that got removed. GetNearestParentPackageJSON on the C++ side isn't called anywhere anymore, so I've also removed that.

Memory benchmarks

I measured memory usage with this script. date-fns is structured in a way that clearly exhibits the heap usage change:

{
  "dependencies": {
    "date-fns": "^4.1.0",
  }
}
const dateFns = require('date-fns');

const main = async () => {
  await global.gc({ execution: 'async', type: 'major-snapshot', filename: `heap-${process.version}.heapsnapshot` });
  console.log(process.memoryUsage());
};

main().then(() => process.exit(0));

main

{
  rss: 273694720,
  heapTotal: 28590080,
  heapUsed: 27558272,
  external: 10461,
  arrayBuffers: 10475
}

Post-change

{
  rss: 102776832,
  heapTotal: 7094272,
  heapUsed: 6154856,
  external: 10461,
  arrayBuffers: 10475
}

Runtime benchmarks

I tested runtime impact with two scripts, one of which is the benchmark from the original PR that re-introduced the cache. I will say that I'm not totally confident in the DataDog/CDK reproduction from the original issue. Even without any caching at all, I don't see performance as bad as the original reports.

{
  "dependencies": {
    "aws-cdk-lib": "^2.214.0",
    "constructs": "^10.4.2",
    "date-fns": "^4.1.0",
    "dd-trace": "^5.65.0"
  }
}
for (let i = 0; i < 1000; i++) {
  require('date-fns');
}
require('dd-trace').init();
const cdk = require('aws-cdk-lib');

const app = new cdk.App();
for (let i = 0; i < 1000; i++) {
  new cdk.Stack(app, `DdTraceStack${i}`)
}

main

➜ hyperfine --warmup 5 -N "../node/node date-fns-benchmark.js"
Benchmark 1: ../node/node date-fns-benchmark.js
  Time (mean ± σ):      84.0 ms ±   0.6 ms    [User: 96.9 ms, System: 15.6 ms]
  Range (min … max):    82.9 ms …  86.4 ms    35 runs
➜ hyperfine --warmup 5 -N "../node/node dd-cdk-benchmark.js"
Benchmark 1: ../node/node dd-cdk-benchmark.js
  Time (mean ± σ):     154.3 ms ±   0.8 ms    [User: 162.5 ms, System: 21.5 ms]
  Range (min … max):   153.0 ms … 155.8 ms    19 runs

Post-change

➜ hyperfine --warmup 5 -N "../node/node date-fns-benchmark.js"
Benchmark 1: ../node/node date-fns-benchmark.js
  Time (mean ± σ):      50.5 ms ±   0.5 ms    [User: 48.4 ms, System: 11.4 ms]
  Range (min … max):    49.4 ms …  52.4 ms    59 runs
➜ hyperfine --warmup 5 -N "../node/node dd-cdk-benchmark.js"
Benchmark 1: ../node/node dd-cdk-benchmark.js
  Time (mean ± σ):     155.9 ms ±   1.0 ms    [User: 160.6 ms, System: 24.6 ms]
  Range (min … max):   154.2 ms … 157.7 ms    19 runs

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. typings labels Sep 14, 2025
@GeoffreyBooth
Copy link
Member

Why not put the cache on the C++ side?

cc @nodejs/performance

@BridgeAR
Copy link
Member

Why not put the cache on the C++ side?

@GeoffreyBooth the boundary crossing is adding up to a big overhead in some modules (see current JS cache addition).

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

codecov bot commented Sep 15, 2025

Codecov Report

❌ Patch coverage is 94.23077% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.27%. Comparing base (4984b15) to head (16d38f8).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/modules/package_json_reader.js 94.23% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #59888      +/-   ##
==========================================
- Coverage   88.28%   88.27%   -0.01%     
==========================================
  Files         702      702              
  Lines      206804   206901      +97     
  Branches    39793    39808      +15     
==========================================
+ Hits       182571   182651      +80     
- Misses      16238    16264      +26     
+ Partials     7995     7986       -9     
Files with missing lines Coverage Δ
src/node_modules.cc 76.34% <ø> (-0.72%) ⬇️
src/node_modules.h 100.00% <ø> (ø)
lib/internal/modules/package_json_reader.js 97.61% <94.23%> (-1.78%) ⬇️

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@michaelsmithxyz michaelsmithxyz force-pushed the parent-package-json-cache-reduction branch from ebf17f2 to 16d38f8 Compare September 15, 2025 13:24
Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green ci.

@aduh95 aduh95 added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. request-ci Add this label to start a Jenkins CI on a PR. labels Sep 15, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 15, 2025
@nodejs-github-bot
Copy link
Collaborator

@michaelsmithxyz
Copy link
Author

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. typings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Significant heap usage regression in Node 22.19.0
9 participants