Skip to content

Avoid writing files that are not changed while compiling incrementally. #6937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 4, 2016

Conversation

vilicvane
Copy link

#3113

If output and related file satisfy these conditions, it will now skip outputting the file:

  1. An output of this file has been made and the last modified time has not changed comparing to last actual output.
  2. The md5 hash of output has not changed comparing to last actual output.

function writeFile(fileName: string, data: string, writeByteOrderMark?: boolean): void {
// If a BOM is required, emit one
if (writeByteOrderMark) {
data = "\uFEFF" + data;
}

const md5 = getMd5(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not think we should do this for all files. in a normal batch compilation secnario, you will doing a fileExits, stat, and createHash needlessly.

Instead, we should expose a new optional function on sys to getModifiedTime, and createHash. in the watch logic, we can query them as needed, and store the map outputFileName => { mtime, hash} there as well.

@mhegazy
Copy link
Contributor

mhegazy commented Feb 8, 2016

as noted above, we should split this into infrastructure, in sys, and the calls should be part of the watch implementation.

@vilicvane
Copy link
Author

@mhegazy Updated~ 😄

const hash = sys.createHash(data);
const mtimeBefore = sys.getModifiedTime(fileName);

if (mtimeBefore && outputFingerprints.hasOwnProperty(fileName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the hasProperty function

const outputFingerprints: Map<OutputFingerprint> =
options.watch && sys.createHash && sys.getModifiedTime ? {} : undefined;

const fileWriter: typeof sys.writeFile = outputFingerprints ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just a function?

function writeFile(fileName, data, writeByteOrderMark) {
    if (outputFingerprints) {
        const hash = sys.createHash(data);
        const mtimeBefore = sys.getModifiedTime(fileName);

        if (mtimeBefore && hasProperty(outputFingerprints, fileName)) {
            const fingerprint = outputFingerprints[fileName];

            // If output has not been changed, and the file has no external modification
            if (fingerprint.byteOrderMark === writeByteOrderMark &&
                fingerprint.hash === hash &&
                fingerprint.mtime.getTime() === mtimeBefore.getTime()) {
                return;
            }
        }
    }

    sys.writeFile(fileName, data, writeByteOrderMark);

    if (outputFingerprints) {
        const mtimeAfter = sys.getModifiedTime(fileName);

        outputFingerprints[fileName] = {
            hash,
            byteOrderMark: writeByteOrderMark,
            mtime: mtimeAfter
        };
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhegazy I personally do not like parallel branches with same condition (because they can not indicate a strong relationship between those branches), so if I write them into a single function, I would like a single if...else with two sys.writeFile in each branch. And after merging them together, function fileWriter would be awkward to exist alone with writeFile. If I inline fileWriter into writeFile, it would result in a long try block. 😢 What's your suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about two functions then:

const outputFingerprints: Map<OutputFingerprint>;


function writeFile(fileName, data, writeByteOrderMark) {
    if (options.watch && sys.createHash && sys.getModifiedTime) {
        writeFileIfUpdated(fileName, data, writeByteOrderMark);
    }
    else {
        sys.writeFile(fileName, data, writeByteOrderMark);
    }
}

function writeFileIfUpdated(fileName, data, writeByteOrderMark) {
    if (!outputFingerprints) {
        outputFingerprints = {};
    }

    const hash = sys.createHash(data);
    const mtimeBefore = sys.getModifiedTime(fileName);

    if (mtimeBefore && hasProperty(outputFingerprints, fileName)) {
        const fingerprint = outputFingerprints[fileName];

        // If output has not been changed, and the file has no external modification
        if (fingerprint.byteOrderMark === writeByteOrderMark &&
            fingerprint.hash === hash &&
            fingerprint.mtime.getTime() === mtimeBefore.getTime()) {
            return;
        }

        sys.writeFile(fileName, data, writeByteOrderMark);

        const mtimeAfter = sys.getModifiedTime(fileName);

        outputFingerprints[fileName] = {
            hash,
            byteOrderMark: writeByteOrderMark,
            mtime: mtimeAfter
        };
    }
}

@vilicvane
Copy link
Author

@mhegazy Thanks, updated~

@mhegazy
Copy link
Contributor

mhegazy commented Feb 11, 2016

👍

@mhegazy
Copy link
Contributor

mhegazy commented Feb 11, 2016

@DanielRosenwasser any more comments?

@DanielRosenwasser
Copy link
Member

Everything looks good to me. We should actually pull down the change and test out something larger like Monaco to see what sort of differences we can observe first.

@vilicvane
Copy link
Author

@DanielRosenwasser Tried it with VS Code (1.8 beta v.s. 1.8 beta cherry-picking related commits) on Windows 10.

i7 5500u / 512G PCI-e SSD (RW 1+GB/s)

Without PR

Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     125935924
Types:          124763
Memory used:   812866K
I/O read:        0.32s
I/O write:       4.81s
Parse time:      4.69s
Bind time:       1.70s
Check time:     12.04s
Emit time:       7.23s
Total time:     25.67s
1:20:15 AM - Compilation complete. Watching for file changes.
1:20:21 AM - File change detected. Starting incremental compilation...
Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     126064848
Types:          124763
Memory used:   947241K
I/O read:        0.00s
I/O write:       4.13s
Parse time:      1.72s
Bind time:       0.05s
Check time:     10.64s
Emit time:       6.30s
Total time:     18.70s
1:20:40 AM - Compilation complete. Watching for file changes.

With PR

Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     125935924
Types:          124763
Memory used:   799590K
I/O read:        0.22s
I/O write:       4.74s
Parse time:      4.06s
Bind time:       2.05s
Check time:     11.04s
Emit time:       7.45s
Total time:     24.61s
1:17:41 AM - Compilation complete. Watching for file changes.
1:18:31 AM - File change detected. Starting incremental compilation...
Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     126064848
Types:          124763
Memory used:   905468K
I/O read:        0.02s
I/O write:       0.21s
Parse time:      1.85s
Bind time:       0.05s
Check time:     10.45s
Emit time:       2.17s
Total time:     14.53s
1:18:46 AM - Compilation complete. Watching for file changes.

But if I exclude folder from Windows Defender, it doesn't help much:

Without PR

Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     125935924
Types:          124763
Memory used:   817125K
I/O read:        0.74s
I/O write:       0.91s
Parse time:      5.17s
Bind time:       1.70s
Check time:     11.63s
Emit time:       3.77s
Total time:     22.28s
1:25:02 AM - Compilation complete. Watching for file changes.
1:25:08 AM - File change detected. Starting incremental compilation...
Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     126064848
Types:          124763
Memory used:   920819K
I/O read:        0.00s
I/O write:       0.82s
Parse time:      1.71s
Bind time:       0.05s
Check time:     10.49s
Emit time:       2.97s
Total time:     15.22s
1:25:24 AM - Compilation complete. Watching for file changes.

With PR

Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     125935924
Types:          124763
Memory used:   810966K
I/O read:        0.29s
I/O write:       1.08s
Parse time:      4.34s
Bind time:       1.79s
Check time:     10.99s
Emit time:       3.95s
Total time:     21.07s
1:27:43 AM - Compilation complete. Watching for file changes.
1:27:51 AM - File change detected. Starting incremental compilation...
Files:            1149
Lines:          316504
Nodes:         1623336
Identifiers:    509501
Symbols:     126064848
Types:          124763
Memory used:   910171K
I/O read:        0.00s
I/O write:       0.30s
Parse time:      1.66s
Bind time:       0.03s
Check time:     10.38s
Emit time:       2.24s
Total time:     14.30s
1:28:05 AM - Compilation complete. Watching for file changes.

@mhegazy
Copy link
Contributor

mhegazy commented Feb 17, 2016

I do not think the value of this change is performance optimization ( as you noted, io write time is usually negligible compared to the rest of the compilation process), the value is not updating the file timestamps unless you need to, that allows for other tools in the tool chain that are watching the outputs to better optimize their watch behavior.

I am busy with the 1.8 release logistics at the moment, but should be able to pull down the change and work with it for a bit, to get some test coverage.

@@ -559,6 +559,12 @@ namespace ts {
sourceMap: false,
};

interface OutputFingerprint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this /* @internal */

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielRosenwasser I am confused, it's not even exported, why does it need /** @internal */?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! Misread the code, forgive me. I was taking a quick glance at this PR again since it's been a bit.

@mhegazy mhegazy merged commit a481305 into microsoft:master Apr 4, 2016
@mhegazy
Copy link
Contributor

mhegazy commented Apr 4, 2016

sorry for the delay. merged manually.

@vilicvane
Copy link
Author

@mhegazy 😄 Thanks~

@rasmusvhansen
Copy link

rasmusvhansen commented Apr 20, 2016

Is there any reason to check the fileModified timestamp?

I have a setup where I (for performance reasons) have the compilation split into three different processes (with a bit of overlap). This means that in the overlapping cases, the hash will match, but the fileModified time will not. It works fine if I remove the file time check from writeFileIfUpdated

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants