fs.read sends callback a buffer with more than what was read

### Version

17

### Platform

Win10 Pro x64

### Subsystem

fs

### What steps will reproduce the bug?

# Summary:
As [documented](https://nodejs.org/api/fs.html#fs_fs_read_fd_buffer_offset_length_position_callback), either fs.read() function takes as a final parameter a callback function which accepts three arguments: `(err, bytesRead, buffer)`.  The buffer passed to this callback is of the same length as the buffer passed in, not limited to the number of bytes read.  

As a workaround, the first line of the callback can be set to:
`const properlyLimitedBuffer = Buffer.from(buffer.buffer, buffer.byteOffset, bytesRead);`
and then the rest of the callback function should use `properlyLimitedBuffer` instead of `buffer`.  This is fairly memory-efficient because Buffer.from() just creates a new view [without copying the contents of the underlying memory](https://nodejs.org/api/buffer.html#buffer_static_method_buffer_from_arraybuffer_byteoffset_length).  However, it appears that something like this line should be in the Node code just prior to calling the callback. 

# Description: 

## Steps To Reproduce:

These steps are for use case nodejs/node#1 as described under “Impact” below.

  0. Have the latest version of Node installed. This was tested with v16.10.0 and code inspection of the current master branch suggests the issue continues on any later version. 
  1. In a known directory, create the following file called `copy.js` in a known location: 

````javascript
const fs = require('fs');
/**
* Assumes inputDir & outputDir already exist.
* Yes, there are easier ways to copy file contents.
* You can imagine whatever other kind of processing that adds more value
* that you want, including processing that writes to databases, summarizes text,
* etc.; adding anything else here unnecessarily complicates the demonstration.
*/
function copyDirContents(inputDir, outputDir) {
  const reusableBuffer = Buffer.alloc(16384);
  fs.readdir(inputDir, async (err, files) => {
    for(let filename of files) {
      const inputPath = inputDir+'/'+filename;
      const outputPath = outputDir+'/'+filename;
      await new Promise(function(resolve, reject) {
        fs.open(inputPath, 'r', (err, fd) => {
          //In this oversimplified demo, one read gets the full file.
          fs.read(fd, {buffer: reusableBuffer}, (err, bytesRead, buffer) => {
            //buffer = Buffer.from(buffer.buffer, buffer.byteOffset, bytesRead);
            if(bytesRead > 0) {
              fs.writeFile(outputPath, buffer, (err) => {
                if(err) { reject(err); } else { resolve(); }
              });
            }
          });
        });
      });
    }
  });
}
copyDirContents("./inputs", "./outputs");
````
 2. Next to that in the same directory, create directories `inputs` and `outputs`.  In inputs, create the following two files:

Doug.txt:
```
Hello world! My name is Doug.
I am a fictional character created for the purpose of
this overly simplified demonstration.
My full name is Douglas F. Perjitsy.
My date of birth is June 23, 1979.
My social security number is 345-678-9012.
My primary credit card is a VISA, 4111 1111 1111 1111, expires 04/25, CVV 321.
My primary bank account is a Citibank Citigold Private Client account,
the kind that requires at least a million dollars minimum total balance,
#251616-116272, with username DouglasFranco and password D0ug$password12345.
The short version of my will is that I leave everything to my daughter Christie.
I really appreciate this highly secure information storage service which
automatically periodically reviews content with AI reminders that help me keep
documents like this up to date every few years. 

```

Eve.txt:
```
Hi I'm Eve!

```
  3. Manually audit copy.js in your favorite code editor, ignoring the commented-out line (and whatever you’ve read in this report) to see if you can spot the [security issue](https://cwe.mitre.org/data/definitions/126.html).  
  4. If your answer in the last step was yes, try it again from the perspective of an average Node.js developer who’s not a security expert and consider how that impacts Node’s reputation for security by default. You can also see the [fs.write() documentation](https://nodejs.org/api/fs.html#fs_fs_write_fd_buffer_offset_length_position_callback) showing its acceptance of a Buffer object directly.  
  5. Run `node copy.js`.
  6. Inspect outputs/Doug.txt.  Notice the large number of null characters at the end of the output file, making it much larger than the corresponding input.
  7. Inspect outputs/Eve.txt, the result of processing Eve’s data, which Eve would presumably have access to.   
  8. Notice that moving the declaration of reusableBuffer down at least 2 lines (to make it not actually reusable between files) helps a lot but doesn’t fix the large number of null characters at the end of the output files.
  9. Uncomment the proposed-fix-workaround line and repeat steps 5-7 inclusive.

Note that the coding style in the example is meant to be compact for minimal illustration of this specific issue, not optimal for maintainability etc. (e.g. no error or large-file handling). 

## Possible fix:
Looking at the source code for Streams, it [appears](https://github.com/nodejs/node/blob/87da53c812c4ee398431d70770c03d2b5ab71651/lib/internal/fs/streams.js#L266) that `if (bytesRead !== buf.length) {` the code will “shrink to fit” 
by using Buffer.allocUnsafeSlow(bytesRead) followed by a copy operation.  That looks like the correct behavior that should be copied over into fs.read, e.g. within the [callback wrapper](https://github.com/nodejs/node/blob/87da53c812c4ee398431d70770c03d2b5ab71651/lib/fs.js#L658) just prior to calling the callback.  This strategy is slower and uses more memory than the Buffer.from() strategy above but is more resilient to a separate misuse where the same buffer is provided to asynchronous code running in parallel.  An example of that case can be derived from the attached example by commenting out lines 15 + 27 (Promise constructor wrapper) and 22 (the most indented line).
  
For maintainability, instead of copying those lines of code as a potential fix, it would be better to split that off into a separate utility function (e.g. `shrinkBufferToFit` or even `Buffer.shrinkToFit(buffer, offset, bytesRead)` if `Buffer.from()` doesn’t cut it) which gets reused in multiple places.

## User Impact

Consider the following four use cases, all using CVSS v3.0’s Network attack vector with Node running as a Web server. 

### 0) File Integrity Check
A user stores a file on the server.  The server computes a hash of the file, using [Crypto.hash.update(buffer)](https://nodejs.org/api/crypto.html#crypto_hash_update_data_inputencoding) with each chunk of the file and then [hash.digest()](https://nodejs.org/api/crypto.html#crypto_hash_digest_encoding) to compute a file hash.  The documentation for `update()` indicates it can take a Buffer object but it does not appear to pay attention to or accept length (and possibly offset) parameter values.

Even with one-chunk files, the result mismatches a hash of the same file computed elsewhere.  This renders the file integrity check pretty useless, eliminating the security benefits one could get from the check.  This occurs even without the programming-error-in-the-name-of-efficiency of reusing a buffer instead of initializing a new zero-filled buffer with Buffer.alloc().  This use case is how the issue was initially found.  
 
### 1) Confidential Information Processing
In this use case, the server application supports the processing of confidential information for independent users.  For this purpose, “processing” might mean something as simple as writing a file to a place where the user can access it later.  The server processes a file which fills most of the buffer for Doug and then, reusing the same Buffer, processes a shorter file for Eve.  Eve gains access to Doug’s confidential data, except for some initial portion obscured by data Eve provided.  Eve could make this data intentionally very short to maximize exploit value, but even with zero technical knowledge or intent to gain unauthorized access, Eve is still gaining unauthorized access to some of Doug’s data.

In this scenario, Doug may have followed all modern best security practices in establishing the existence of his confidential information on the target system when becoming a user, perhaps years earlier (e.g. if the buffer was populated by some occasional or periodically run server-initiated process, example [here](https://www.usenix.org/conference/usenixsecurity21/presentation/khan-mohammad)) and Eve’s compromise of the data did not require any interaction from Doug.  One could argue that in certain configurations of the unintentional compromise case, this doesn’t require any interaction from the attacker either.  

### 2) Administrator Executable Privilege
In this use case, the server application allows an administrative user to upload executable code or commands which is/are later executed on the server.  Attacker Alice, who has only the minimum level of public or user permissions required to make use of the buffer, fills most or all of the buffer or at least some piece at the end of the buffer with malicious code.  Administrator Bob then fills the beginning of the buffer with a short executable that is saved to disk and/or subsequently executed.  If the buffer is processed without the end limit at the write/execute stage, Alice’s unauthorized code executes after Bob’s authorized code.  

Note that in this scenario, Alice does not strictly have to be intentionally attacking.  However, if Alice is not intentionally attacking, the remaining contents of the buffer are likely to be junk that does not execute something interesting but rather throws an error. If the execution is done in the Node server’s process, it could cause the whole Node server process to exit, taking down the site and acting as an effective DoS for legitimate users.  This can also be done if the data constitutes commands or input to commands for Node to execute, if the functions being executed make assumptions about when or how they will be called or assumptions about input which are violated by the junk or malicious data, and where the error causes the server process to exit.  This seems much more common than the scenario where an application allows an administrator to upload near-arbitrary executable code saved and subsequently run on the system.  (This DoS strategy has also been observed in a production application with less-than-stellar code quality.) 

### 3) Software repository
In this use case, the server application allows an administrative user to upload executable code which is made available for user/public download (e.g. sites hosting useful software tools for [government officials](https://www.electiontools.org/), [medical](https://collaborate.hms.harvard.edu/display/RITS/Research+Software+Tools)/human subjects [researchers](https://www.cmu.edu/computing/software/), [business users](https://sourceforge.net/software/), [distribution into the software supply chain](https://www.npmjs.com/) etc.).   The attack setup is similar to nodejs/node#2, but now the compromised code is running on users’ machines.  

Regarding privileges needed to exploit this, note that Web applications could be configured to make publicly available functionality that alters the contents of a reused server-side Buffer.  This was observed, for example, in an incorrect usage of the ‘express-fileupload’ module that accessed an underlying ArrayBuffer.  


Most relevant first-page results in multiple searches suggested implementations using the whole buffer without any limit based on bytes read, or at least neglected to mention anything about the security practices that Node.js relies on developers to be using.  [Examples](https://www.c-sharpcorner.com/UploadFile/dacca2/node-js-in-action-read-simple-text-file-using-fs-module/) [are](https://brainbell.com/javascript/fs-open-read-write-file.html) [here](https://www.youtube.com/watch?v=YmHcDe1aJwU).  


On the first page of results in [Tabnine's search for code snippets using fs.read()](https://www.tabnine.com/code/javascript/functions/fs/read) I saw just [johnsonj561](https://github.com/johnsonj561/Node-JS-Tutorial/blob/30acfc46b31fdaecc4c065292bf1ef3bc00d3805/tutorials/filemanager/asyncReadFile.js#L13) demonstrating the correct usage.  Usage of the entire buffer or with explicit reference to buffer.length seemed to be most common, though a [couple](https://github.com/carboneio/carbone/blob/37bbd6e44cd68ead102a73328f1fdecee939b842/lib/file.js#L18) examples just checked a subset of initial bytes as needed for their use case, and [a](https://github.com/Orange-OpenSource/just-drop-it/blob/31b1ce4744b310ff38a3b78573344fec19bb54c2/routes/javascript.js#L68) [couple](https://github.com/vortesnail/node-learning/blob/61424340542cabf59c65ab887c97bf153c0bc843/basic-demos/43_promisify.js#L6) were local aliases for readFileSync. 

Search results also included the [official fs documentation](https://nodejs.org/api/fs.html) which discusses how the callback gets a `Buffer` object that appears identical in description to what other functions say they take as input.  While the official documentation can of course be enhanced to call this out, there's a lot of unofficial documentation not in our control which doesn't, it's not intuitive to have to include that extra step, and if everybody should be using that, why isn't it included in the Node code to support better security by default? Reuse of the buffer would still be supported as seen in the demo (with commented-out line active), unless someone is calling `read()` from within the callback function with an **increasing** manually-set `length` value or after an end-of-file (insecure case) - with only the callback-parameter version of the buffer instead of the one passed in to `read`.  

Search results also included a fair number of examples with `readFile()` variants which are a higher level, but which can create some unpredictable memory-based limits on the sizes of files that can be handled.  This would seem to mean that conversions from use of `readFile()` to `read()` (the easiest-to-find chunk-based alternative which doesn’t have the memory usage constraint) might tend to be made under time pressure for fixing a bug that's actively blocking a real use case. 

## Code impact

This is likely to impact similar other functions such as the multiple versions of [filehandle.read](https://nodejs.org/api/fs.html#fs_filehandle_read_buffer_offset_length_position) and related synchronous/promisified functions, but without having done an in-depth analysis of the source code, I hesitantly assume good coding practices would have that handled already by other affected functions wrapping the affected code.

This was previously confidentially reported as a security issue, observing that [CVE-2021-22939](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-22939) was also considered a security issue even though a developer using a Node.js API differently could have solved it.  The response from the Node.js team noted that like many of Node.js's low-level APIs, the fs.read() APIs is modelled after [a UNIX API](https://linux.die.net/man/2/read) and Unix APIs are not seen as vulnerabilities but rather good models.  The buffer is not trimmed in case a user is misusing a pattern of passing it through for reuse again.  The Node.js security team thinks that an update to official documentation is all that's needed here, with @mcollina offering to open a PR for doing that. 






### How often does it reproduce? Is there a required condition?

Always.

### What is the expected behavior?

The buffer provided to the callback from read() and its associated functions is limited to what was read.

### What do you see instead?

The buffer contains extra information, which may include confidential information of another user or malicious code. 
Depending on how the buffer is then used, this could be quite problematic.

### Additional information

I still disagree that an update to official documentation is all that's needed and think that this should be fixed in Node code, on a semver-major release if necessary, but the discussion directed that next steps should be public discussion here on the issue tracker.  I include the potential impact points above to help inform that discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fs.read sends callback a buffer with more than what was read #3657

Version

Platform

Subsystem

What steps will reproduce the bug?

Summary:

Description:

Steps To Reproduce:

Possible fix:

User Impact

0) File Integrity Check

1) Confidential Information Processing

2) Administrator Executable Privilege

3) Software repository

Code impact

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fs.read sends callback a buffer with more than what was read #3657

Description

Version

Platform

Subsystem

What steps will reproduce the bug?

Summary:

Description:

Steps To Reproduce:

Possible fix:

User Impact

0) File Integrity Check

1) Confidential Information Processing

2) Administrator Executable Privilege

3) Software repository

Code impact

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions