Skip to content

fsmonitor updates for improved performance #212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ static inline unsigned create_ce_flags(unsigned stage)
#define ce_namelen(ce) ((ce)->ce_namelen)
#define ce_size(ce) cache_entry_size(ce_namelen(ce))
#define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
#define ce_uptodate(ce) ((ce)->ce_flags & CE_UPTODATE)
#define ce_uptodate(ce) (((ce)->ce_flags & CE_UPTODATE) || ((ce)->ce_flags & CE_FSMONITOR_VALID))
#define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE)
#define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
#define ce_intent_to_add(ce) ((ce)->ce_flags & CE_INTENT_TO_ADD)
Expand Down
32 changes: 23 additions & 9 deletions fsmonitor.c
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data,
}
istate->fsmonitor_dirty = fsmonitor_dirty;

if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
(uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);

Expand Down Expand Up @@ -83,7 +83,7 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
uint32_t ewah_size = 0;
int fixup = 0;

if (istate->fsmonitor_dirty->bit_size > istate->cache_nr)
if (!istate->split_index && istate->fsmonitor_dirty->bit_size > istate->cache_nr)
BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)",
(uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);

Expand Down Expand Up @@ -189,13 +189,29 @@ void refresh_fsmonitor(struct index_state *istate)
}
if (bol < query_result.len)
fsmonitor_refresh_callback(istate, buf + bol);

/* Now mark the untracked cache for fsmonitor usage */
if (istate->untracked)
istate->untracked->use_fsmonitor = 1;
} else {
/*
* We only want to run the post index changed hook if we've
* actually changed entries, so keep track if we actually
* changed entries or not
*/
int is_cache_changed = 0;

/* Mark all entries invalid */
for (i = 0; i < istate->cache_nr; i++)
istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
for (i = 0; i < istate->cache_nr; i++) {
if (istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) {
is_cache_changed = 1;
istate->cache[i]->ce_flags &= ~CE_FSMONITOR_VALID;
}
}

/* If we're going to check every file, ensure we save the results */
istate->cache_changed |= FSMONITOR_CHANGED;
/* If we're going to check every file, ensure we save results */
if (is_cache_changed)
istate->cache_changed |= FSMONITOR_CHANGED;

if (istate->untracked)
istate->untracked->use_fsmonitor = 0;
Expand Down Expand Up @@ -257,9 +273,7 @@ void tweak_fsmonitor(struct index_state *istate)
(uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr);
ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate);

/* Now mark the untracked cache for fsmonitor usage */
if (istate->untracked)
istate->untracked->use_fsmonitor = 1;
refresh_fsmonitor(istate);
}

ewah_free(istate->fsmonitor_dirty);
Expand Down
9 changes: 7 additions & 2 deletions t/t7519-status-fsmonitor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ EOF

# test that "update-index --fsmonitor-valid" sets the fsmonitor valid bit
test_expect_success 'update-index --fsmonitor-valid" sets the fsmonitor valid bit' '
write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF
git update-index --fsmonitor &&
git update-index --fsmonitor-valid dir1/modified &&
git update-index --fsmonitor-valid dir2/modified &&
Expand Down Expand Up @@ -164,6 +166,8 @@ EOF

# test that newly added files are marked valid
test_expect_success 'newly added files are marked valid' '
write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I do not understand this diff hunk. previously, we let the test fsmonitor tell Git to look at the new files, but now we don't?

Is this a diff hunk that is not exactly necessary for the test case to pass, but merely to make the test more accurate, by disallowing fsmonitor to trigger lstat()s on the three new files and instead forcing it to rely on the implicit information given by the git add` commands?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. The issue is that with these changes the fsmonitor data is refreshed when any command reads the index. Left unchanged the git ls-files would refresh and be using the dirty files from the previous test which would include the newly added files and they would be marked as dirty which would fail this test. Perhaps I can use the test-tool dump-fsmonitor to get the bitmap to compare against instead of using ls-files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up question here: what behavior is this test specifically trying to validate?

The comment says:

test that newly added files are marked valid

But based on the change you made it seems like newly added files will not always be marked as valid (i.e. if they were dirty before they will still be considered dirty after the git add).

Is there there something I'm missing (e.g. was the test .git/hooks/fsmonitor-test behaving incorrectly, and that's why it needs to be set to empty at the start of the test)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for this test the

write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF

needs to be before the git ls-files so that the git add runs with the paths in .git/hooks/fsmonitor-test but for each git add the single path for the added file would need to be in .git/hooks/fsmonitor-test otherwise the last git add would mark the other paths as dirty and save that index out.

I'm taking a closer look at the tests because before the content of the .git/hooks/fsmonitor-test did not affect commands that were not refreshing cache entries whereas now any commands that reads the index will get the entries marked dirty that are in the .git/hooks/fsmonitor-test. This means that even git ls-files which is being used to validate the CE_FSMONITOR_VALID could have that affected by what is in .git/hooks/fsmonitor-test. So if left with all the paths in .git/hooks/fsmonitor-test, those paths will be dirty for every git command that reads the index.

So in most cases we need to make sure .git/hooks/fsmonitor-test is empty for any validating commands like git ls-files so that the contents of that file will not change what cache entries are dirty.

I might also try using the test-dump-fsmonitor since that it only dumping the index entries before applying the bitmap or the .git/hooks/fsmonitor-test paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or each git add the single path for the added file would need to be in .git/hooks/fsmonitor-test otherwise the last git add would mark the other paths as dirty and save that index out.

Ahh, this is the part I was missing, thanks!

git add new &&
git add dir1/new &&
git add dir2/new &&
Expand Down Expand Up @@ -218,11 +222,12 @@ test_expect_success '*only* files returned by the integration script get flagged
# Ensure commands that call refresh_index() to move the index back in time
# properly invalidate the fsmonitor cache
test_expect_success 'refresh_index() invalidates fsmonitor cache' '
write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF
clean_repo &&
dirty_repo &&
write_integration_script &&
git add . &&
write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically the change in 'newly added files are marked valid' made it so that the fsmonitor is not allowed to tell Git to look at any file's stat in all the test cases up until here. Hmm. I would really like to have at least a paragraph in the commit message providing a compelling argument why that is a good thing.

And then I really don't understand why we have to have the full fsmonitor-test script again, but only for dirty_repo and for git add .?

git commit -m "to reset" &&
git reset HEAD~1 &&
git status >actual &&
Expand Down
22 changes: 1 addition & 21 deletions t/t7519/fsmonitor-watchman
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ use IPC::Open2;
# 'git config core.fsmonitor .git/hooks/query-watchman'
#
my ($version, $time) = @ARGV;
#print STDERR "$0 $version $time\n";

# Check the hook interface version

Expand Down Expand Up @@ -45,7 +44,7 @@ launch_watchman();

sub launch_watchman {

my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";

Expand All @@ -63,19 +62,11 @@ sub launch_watchman {
"fields": ["name"]
}]
END

open (my $fh, ">", ".git/watchman-query.json");
print $fh $query;
close $fh;

print CHLD_IN $query;
close CHLD_IN;
my $response = do {local $/; <CHLD_OUT>};

open ($fh, ">", ".git/watchman-response.json");
print $fh $response;
close $fh;

die "Watchman: command returned no output.\n" .
"Falling back to scanning...\n" if $response eq "";
die "Watchman: command returned invalid output: $response\n" .
Expand All @@ -94,7 +85,6 @@ sub launch_watchman {
my $o = $json_pkg->new->utf8->decode($response);

if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve root .* directory (.*) is not watched/) {
print STDERR "Adding '$git_work_tree' to watchman's watch list.\n";
$retry--;
qx/watchman watch "$git_work_tree"/;
die "Failed to make watchman watch '$git_work_tree'.\n" .
Expand All @@ -104,11 +94,6 @@ sub launch_watchman {
# return the fast "everything is dirty" flag to git and do the
# Watchman query just to get it over with now so we won't pay
# the cost in git to look up each individual file.

open ($fh, ">", ".git/watchman-output.out");
print "/\0";
close $fh;

print "/\0";
eval { launch_watchman() };
exit 0;
Expand All @@ -117,11 +102,6 @@ sub launch_watchman {
die "Watchman: $o->{error}.\n" .
"Falling back to scanning...\n" if $o->{error};

open ($fh, ">", ".git/watchman-output.out");
binmode $fh, ":utf8";
print $fh @{$o->{files}};
close $fh;

binmode STDOUT, ":utf8";
local $, = "\0";
print @{$o->{files}};
Expand Down
128 changes: 128 additions & 0 deletions t/t7519/fsmonitor-watchman-debug
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#!/usr/bin/perl

use strict;
use warnings;
use IPC::Open2;

# An example hook script to integrate Watchman
# (https://facebook.github.io/watchman/) with git to speed up detecting
# new and modified files.
#
# The hook is passed a version (currently 1) and a time in nanoseconds
# formatted as a string and outputs to stdout all files that have been
# modified since the given time. Paths must be relative to the root of
# the working tree and separated by a single NUL.
#
# To enable this hook, rename this file to "query-watchman" and set
# 'git config core.fsmonitor .git/hooks/query-watchman'
#
my ($version, $time) = @ARGV;
#print STDERR "$0 $version $time\n";

# Check the hook interface version

if ($version == 1) {
# convert nanoseconds to seconds
# subtract one second to make sure watchman will return all changes
$time = int ($time / 1000000000) - 1;
} else {
die "Unsupported query-fsmonitor hook version '$version'.\n" .
"Falling back to scanning...\n";
}

my $git_work_tree;
if ($^O =~ 'msys' || $^O =~ 'cygwin') {
$git_work_tree = Win32::GetCwd();
$git_work_tree =~ tr/\\/\//;
} else {
require Cwd;
$git_work_tree = Cwd::cwd();
}

my $retry = 1;

launch_watchman();

sub launch_watchman {

my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";

# In the query expression below we're asking for names of files that
# changed since $time but were not transient (ie created after
# $time but no longer exist).
#
# To accomplish this, we're using the "since" generator to use the
# recency index to select candidate nodes and "fields" to limit the
# output to file names only.

my $query = <<" END";
["query", "$git_work_tree", {
"since": $time,
"fields": ["name"]
}]
END

open (my $fh, ">", ".git/watchman-query.json");
print $fh $query;
close $fh;

print CHLD_IN $query;
close CHLD_IN;
my $response = do {local $/; <CHLD_OUT>};

open ($fh, ">", ".git/watchman-response.json");
print $fh $response;
close $fh;

die "Watchman: command returned no output.\n" .
"Falling back to scanning...\n" if $response eq "";
die "Watchman: command returned invalid output: $response\n" .
"Falling back to scanning...\n" unless $response =~ /^\{/;

my $json_pkg;
eval {
require JSON::XS;
$json_pkg = "JSON::XS";
1;
} or do {
require JSON::PP;
$json_pkg = "JSON::PP";
};

my $o = $json_pkg->new->utf8->decode($response);

if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve root .* directory (.*) is not watched/) {
print STDERR "Adding '$git_work_tree' to watchman's watch list.\n";
$retry--;
qx/watchman watch "$git_work_tree"/;
die "Failed to make watchman watch '$git_work_tree'.\n" .
"Falling back to scanning...\n" if $? != 0;

# Watchman will always return all files on the first query so
# return the fast "everything is dirty" flag to git and do the
# Watchman query just to get it over with now so we won't pay
# the cost in git to look up each individual file.

open ($fh, ">", ".git/watchman-output.out");
print "/\0";
close $fh;

print "/\0";
eval { launch_watchman() };
exit 0;
}

die "Watchman: $o->{error}.\n" .
"Falling back to scanning...\n" if $o->{error};

open ($fh, ">", ".git/watchman-output.out");
binmode $fh, ":utf8";
print $fh @{$o->{files}};
close $fh;

binmode STDOUT, ":utf8";
local $, = "\0";
print @{$o->{files}};
}
3 changes: 3 additions & 0 deletions unpack-trees.c
Original file line number Diff line number Diff line change
Expand Up @@ -1584,6 +1584,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
o->merge_size = len;
mark_all_ce_unused(o->src_index);

if (o->src_index->fsmonitor_last_update)
o->result.fsmonitor_last_update = o->src_index->fsmonitor_last_update;

/*
* Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
*/
Expand Down