fsmonitor updates for improved performance #212

kewillford · 2019-10-22T20:00:24Z

This change does two main things.

Adds a check for the CE_FSMONITOR_VALID flag in the ce_uptodate macro so that whenever the code is checking if any entry is up to date the fsmonitor flag will be taken into consideration.
When unpacking trees keep the fsmonitor data so that the next command does not have to pay the price to check all the entries.

kewillford · 2019-10-31T17:52:20Z

Here are some of the performance differences with these changes

command	Previous Duration	Current Duration	Seconds difference	percent change
checkout after folderplaceholder enumeration	17.40225	3.21683	-14.18542	-81.51486
status 1 after checkout feature/gvfs/perftest/defaultBranch	15.53006	3.03953	-12.49053	-80.42809
status 1 after checkout -b user/me/topic feature/gvfs/perftest/defaultBranch~1	15.66574	3.09206	-12.57368	-80.26228
status 1 after pull --ff-only origin feature/gvfs/perftest/checkoutBranch	16.08161	3.62207	-12.45954	-77.47694
status 1 after checkout -b pullTest feature/gvfs/perftest/checkoutBranch~10	15.61599	3.63645	-11.97954	-76.71329
status 1 after merge feature/gvfs/perftest/defaultBranch --no-commit	13.1538	3.17194	-9.98186	-75.88575
git reset --hard HEAD	26.76172	6.68994	-20.07178	-75.00183
status 1 after reset --hard HEAD	12.11019	3.15032	-8.95987	-73.98621
status 1 after checkout feature/gvfs/perftest/checkoutBranch	17.4508	4.88049	-12.57031	-72.03286
status 1 after stash	10.89549	3.09768	-7.79781	-71.56915
git add --all	30.97013	10.36075	-20.60938	-66.54599
status 2 after First ReadFiles	4.35711	1.60531	-2.7518	-63.15654
git stash	31.56066	11.86821	-19.69245	-62.39556
git merge feature/gvfs/perftest/defaultBranch --no-commit	21.15363	8.60551	-12.54812	-59.31899
git stash pop	19.23136	9.56369	-9.66767	-50.27034
git rebase feature/gvfs/perftest/defaultBranch	26.31549	13.19811	-13.11738	-49.84661
status 1 after merge --abort	14.68546	7.55865	-7.12681	-48.5297

derrickstolee · 2019-11-01T14:36:00Z

We need to figure something out about how fsmonitor talks specifically to watchman. We are not robust to script-level frequency (my test is on v2.24.0-rc2).

The GIT_TEST_FSMONITOR environment variable can take a hook path, and there is an included hook for Watchman. There are numerous issues with this integration on Linux (we cannot delete repos after registering them with Watchman, so that causes many test failures), but also even the simple test_commit function doesn't work!

For example:

GIT_TRACE=1 GIT_TEST_FSMONITOR="$(pwd)/t7519/fsmonitor-watchman" ./t7060-wtstatus.sh -x -v -d -i

In test 5, the following commands are run in order:

		test_commit initial foo "" &&
		test_commit modify foo foo &&

and here are the logs for those two lines:

+ test_commit initial foo 
+ notick=
+ signoff=
+ indir=
+ test 3 != 0
+ break
+ indir=
+ file=foo
+ echo 
+ git add foo
trace: built-in: git add foo
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499310242307
Adding '/_git/git/t/trash directory.t7060-wtstatus/mdconflict' to watchman's watch list.
+ test -z 
+ test_tick
+ test -z set
+ test_tick=1112912173
+ GIT_COMMITTER_DATE=1112912173 -0700
+ GIT_AUTHOR_DATE=1112912173 -0700
+ export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
+ git commit -m initial
trace: built-in: git commit -m initial
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499374543700
trace: run_command: git gc --auto
trace: built-in: git gc --auto
[master (root-commit) a3c5375] initial
 Author: A U Thor <[email protected]>
 1 file changed, 1 insertion(+)
 create mode 100644 foo
+ git tag initial
trace: built-in: git tag initial
+ test_commit modify foo foo
+ notick=
+ signoff=
+ indir=
+ test 3 != 0
+ break
+ indir=
+ file=foo
+ echo foo
+ git add foo
trace: built-in: git add foo
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499425203804
+ test -z 
+ test_tick
+ test -z set
+ test_tick=1112912233
+ GIT_COMMITTER_DATE=1112912233 -0700
+ GIT_AUTHOR_DATE=1112912233 -0700
+ export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
+ git commit -m modify
trace: built-in: git commit -m modify
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499517349606
On branch master
nothing to commit, working tree clean

The second git commit -m modify fails because the second git add foo did nothing. It triggered the watchman call, but did not receive a change for that file, so did not update the index. (I verified this by adding an extra git status call in the test_commit code and re-running the test.)

We will need to think about how to make our Watchman integration more robust and set up some automation to run the test suite with Watchman specifically.

cc: @kewillford, @jrbriggs, @wilbaker, @dscho

jeffhostetler · 2019-11-01T15:39:27Z

@derrickstolee Can you run this with GIT_TRACE_FSMONITOR turned on too? Not sure if that'll help or not, but worth a shot.

D'oh, I just noticed that you did have that turned on in the above command line. It there anything in that other log file?

dscho

Looks good!

I asked for a few clarifications, and suggest to split the change that copies the last_update when copying an index into its own commit, but none of these are super-important.

But please add your sign-off to the commit messages.

t/t7113-post-index-change-hook.sh

t/t7519-status-fsmonitor.sh

dscho · 2019-11-01T20:13:23Z

t/t7519-status-fsmonitor.sh

@@ -164,6 +166,8 @@ EOF

 # test that newly added files are marked valid
 test_expect_success 'newly added files are marked valid' '
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+	EOF


I guess I do not understand this diff hunk. previously, we let the test fsmonitor tell Git to look at the new files, but now we don't?

Is this a diff hunk that is not exactly necessary for the test case to pass, but merely to make the test more accurate, by disallowing fsmonitor to trigger lstat()s on the three new files and instead forcing it to rely on the implicit information given by the git add` commands?

Thanks for pointing this out. The issue is that with these changes the fsmonitor data is refreshed when any command reads the index. Left unchanged the git ls-files would refresh and be using the dirty files from the previous test which would include the newly added files and they would be marked as dirty which would fail this test. Perhaps I can use the test-tool dump-fsmonitor to get the bitmap to compare against instead of using ls-files.

Follow up question here: what behavior is this test specifically trying to validate?

The comment says:

test that newly added files are marked valid

But based on the change you made it seems like newly added files will not always be marked as valid (i.e. if they were dirty before they will still be considered dirty after the git add).

Is there there something I'm missing (e.g. was the test .git/hooks/fsmonitor-test behaving incorrectly, and that's why it needs to be set to empty at the start of the test)?

Maybe for this test the

write_script .git/hooks/fsmonitor-test<<-\EOF && EOF

needs to be before the git ls-files so that the git add runs with the paths in .git/hooks/fsmonitor-test but for each git add the single path for the added file would need to be in .git/hooks/fsmonitor-test otherwise the last git add would mark the other paths as dirty and save that index out.

I'm taking a closer look at the tests because before the content of the .git/hooks/fsmonitor-test did not affect commands that were not refreshing cache entries whereas now any commands that reads the index will get the entries marked dirty that are in the .git/hooks/fsmonitor-test. This means that even git ls-files which is being used to validate the CE_FSMONITOR_VALID could have that affected by what is in .git/hooks/fsmonitor-test. So if left with all the paths in .git/hooks/fsmonitor-test, those paths will be dirty for every git command that reads the index.

So in most cases we need to make sure .git/hooks/fsmonitor-test is empty for any validating commands like git ls-files so that the contents of that file will not change what cache entries are dirty.

I might also try using the test-dump-fsmonitor since that it only dumping the index entries before applying the bitmap or the .git/hooks/fsmonitor-test paths.

or each git add the single path for the added file would need to be in .git/hooks/fsmonitor-test otherwise the last git add would mark the other paths as dirty and save that index out.

Ahh, this is the part I was missing, thanks!

dscho · 2019-11-01T20:17:23Z

t/t7519-status-fsmonitor.sh

 	dirty_repo &&
 	git add . &&
+	write_script .git/hooks/fsmonitor-test<<-\EOF &&
+	EOF


So basically the change in 'newly added files are marked valid' made it so that the fsmonitor is not allowed to tell Git to look at any file's stat in all the test cases up until here. Hmm. I would really like to have at least a paragraph in the commit message providing a compelling argument why that is a good thing.

And then I really don't understand why we have to have the full fsmonitor-test script again, but only for dirty_repo and for git add .?

unpack-trees.c

fsmonitor.c

wilbaker

Changes look good, same questions as @dscho on the updates to the tests.

derrickstolee · 2019-11-06T18:21:03Z

@kewillford when you get this rebased on top of features/sparse-checkout-2.24.0 then I can launch the C# tests using watchman for additional confidence.

derrickstolee · 2019-11-12T18:56:02Z

@kewillford I kicked off a new build for you here.

3444ec2 ("fsmonitor: don't fill bitmap with entries to be removed", 2019-10-11) added a handful of sanity checks that make sure that a bit position in fsmonitor bitmap does not go beyond the end of the index. As each bit in the bitmap corresponds to a path in the index, this is the right check most of the time. Except for the case when we are in the split-index mode and looking at a delta index that is to be overlayed on the base index but before the base index has actually been merged in, namely in read_ and write_fsmonitor_extension(). In these codepaths, the entries in the split/delta index is typically a small subset of the entire set of paths (otherwise why would we be using split-index?), so the bitmap used by the fsmonitor is almost always larger than the number of entries in the partial index, and the incorrect comparison would trigger the BUG(). Signed-off-by: Kevin Willford <[email protected]>

When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when wanting to know if the entry has been updated. If the flag is set the entry should be considered up to date and the same as if the CE_UPTODATE is set. In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to be refreshed when the fsmonitor bitmap is applied to the index in tweak_fsmonitor. Since the fsmonitor data is kept up to date for every command, some tests needed to be updated to take that into account. istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since that is being called in tweak_fsmonitor. refresh_fsmonitor will only be called once and any other callers should be setting it when refreshing the fsmonitor data so that code can use the fsmonitor data when checking untracked files. When writing the index, fsmonitor_last_update is used to determine if the fsmonitor bitmap should be created and the extension data written to the index. When running through unpack-trees this is not copied to the result index. This makes the next time a git command is ran do all the work of lstating all files to determine what is clean since all entries in the index are marked as dirty since there wasn't any fsmonitor data saved in the index extension. Copying the fsmonitor_last_update to the result index will cause the extension data for fsmonitor to be in the index for the next git command to use. Signed-off-by: Kevin Willford <[email protected]>

The fsmonitor script that can be used for running all the git tests using watchman was causing some of the tests to fail because it wrote to stderr and created some files for debugging purposes. Add a new debug script to use with debugging and modify the other script to remove the code that would cause tests to fail. Signed-off-by: Kevin Willford <[email protected]>

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

kewillford force-pushed the test_status_perf branch 5 times, most recently from 7b90325 to 2da1e29 Compare October 30, 2019 17:42

kewillford force-pushed the test_status_perf branch from 2da1e29 to 392d797 Compare October 31, 2019 17:36

kewillford changed the title ~~[WIP] status after unpack_trees~~ fsmonitor updates for improved performance Oct 31, 2019

kewillford marked this pull request as ready for review October 31, 2019 17:40

kewillford requested review from derrickstolee, dscho and jeffhostetler October 31, 2019 17:53

kewillford self-assigned this Oct 31, 2019

dscho approved these changes Nov 1, 2019

View reviewed changes

wilbaker reviewed Nov 1, 2019

View reviewed changes

fsmonitor.c Show resolved Hide resolved

wilbaker approved these changes Nov 4, 2019

View reviewed changes

kewillford force-pushed the test_status_perf branch 2 times, most recently from 4539b52 to 843cf19 Compare November 6, 2019 15:43

kewillford force-pushed the test_status_perf branch from 843cf19 to 45814c2 Compare November 6, 2019 18:42

derrickstolee changed the base branch from features/sparse-checkout-2.23.0 to features/sparse-checkout-2.24.0 November 6, 2019 18:58

derrickstolee approved these changes Nov 7, 2019

View reviewed changes

derrickstolee changed the base branch from features/sparse-checkout-2.24.0 to vfs-2.24.0 November 12, 2019 18:54

kewillford added 3 commits November 21, 2019 10:21

kewillford force-pushed the test_status_perf branch from 45814c2 to 87c1478 Compare November 21, 2019 19:06

dscho added a commit that referenced this pull request May 14, 2024

Merge updates to serialized status

130ec0d

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jun 3, 2024

Merge updates to serialized status

36958c7

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jul 17, 2024

Merge updates to serialized status

980f286

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jul 17, 2024

Merge updates to serialized status

0c158e3

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jul 17, 2024

Merge updates to serialized status

77a1bf6

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jul 18, 2024

Merge updates to serialized status

f70cc78

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

mjcheetham pushed a commit that referenced this pull request Jul 23, 2024

Merge updates to serialized status

84a7bc2

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jul 25, 2024

Merge updates to serialized status

09973b4

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

mjcheetham pushed a commit that referenced this pull request Jul 29, 2024

Merge updates to serialized status

ea134a8

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Sep 18, 2024

Merge updates to serialized status

97eaa01

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Sep 24, 2024

Merge updates to serialized status

bdc5e99

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Oct 8, 2024

Merge updates to serialized status

9b44aaa

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

mjcheetham pushed a commit that referenced this pull request Dec 3, 2024

Merge updates to serialized status

aca1a51

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Dec 17, 2024

Merge updates to serialized status

7497722

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Dec 18, 2024

Merge updates to serialized status

df9a48a

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jan 1, 2025

Merge updates to serialized status

c6bb230

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jan 1, 2025

Merge updates to serialized status

0f653e1

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jan 1, 2025

Merge updates to serialized status

478e8de

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Feb 10, 2025

Merge updates to serialized status

ff2be29

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Feb 27, 2025

Merge updates to serialized status

0ce6759

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Mar 5, 2025

Merge updates to serialized status

72361e6

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

mjcheetham pushed a commit that referenced this pull request Mar 12, 2025

Merge updates to serialized status

b8e619a

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

mjcheetham pushed a commit that referenced this pull request Mar 17, 2025

Merge updates to serialized status

1c5d9f8

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jun 6, 2025

Merge updates to serialized status

c65f653

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jun 11, 2025

Merge updates to serialized status

8e6f975

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jun 13, 2025

Merge updates to serialized status

207b3c5

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jun 16, 2025

Merge updates to serialized status

7f0b21a

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jun 16, 2025

Merge updates to serialized status

f220266

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jun 16, 2025

Merge updates to serialized status

9b373b2

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

dscho added a commit that referenced this pull request Jul 8, 2025

Merge updates to serialized status

2a231d8

Includes these pull requests: #1 #6 #10 #11 #157 #212 #260 #270 Signed-off-by: Derrick Stolee <[email protected]>

fsmonitor updates for improved performance #212

fsmonitor updates for improved performance #212

Uh oh!

Conversation

kewillford commented Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kewillford commented Oct 31, 2019

Uh oh!

derrickstolee commented Nov 1, 2019

Uh oh!

jeffhostetler commented Nov 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dscho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dscho Nov 1, 2019

Choose a reason for hiding this comment

Uh oh!

kewillford Nov 5, 2019

Choose a reason for hiding this comment

Uh oh!

wilbaker Nov 5, 2019

Choose a reason for hiding this comment

Uh oh!

kewillford Nov 5, 2019

Choose a reason for hiding this comment

Uh oh!

wilbaker Nov 5, 2019

Choose a reason for hiding this comment

Uh oh!

dscho Nov 1, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wilbaker left a comment

Choose a reason for hiding this comment

Uh oh!

derrickstolee commented Nov 6, 2019

Uh oh!

derrickstolee commented Nov 12, 2019

Uh oh!

Uh oh!

kewillford commented Oct 22, 2019 •

edited

Loading

jeffhostetler commented Nov 1, 2019 •

edited

Loading