-
Notifications
You must be signed in to change notification settings - Fork 98
fsmonitor updates for improved performance #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7b90325
to
2da1e29
Compare
2da1e29
to
392d797
Compare
Here are some of the performance differences with these changes
|
We need to figure something out about how fsmonitor talks specifically to watchman. We are not robust to script-level frequency (my test is on v2.24.0-rc2). The For example:
In test 5, the following commands are run in order:
and here are the logs for those two lines:
The second We will need to think about how to make our Watchman integration more robust and set up some automation to run the test suite with Watchman specifically. cc: @kewillford, @jrbriggs, @wilbaker, @dscho |
@derrickstolee Can you run this with GIT_TRACE_FSMONITOR turned on too? Not sure if that'll help or not, but worth a shot. D'oh, I just noticed that you did have that turned on in the above command line. It there anything in that other log file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I asked for a few clarifications, and suggest to split the change that copies the last_update
when copying an index into its own commit, but none of these are super-important.
But please add your sign-off to the commit messages.
@@ -164,6 +166,8 @@ EOF | |||
|
|||
# test that newly added files are marked valid | |||
test_expect_success 'newly added files are marked valid' ' | |||
write_script .git/hooks/fsmonitor-test<<-\EOF && | |||
EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I do not understand this diff hunk. previously, we let the test fsmonitor tell Git to look at the new files, but now we don't?
Is this a diff hunk that is not exactly necessary for the test case to pass, but merely to make the test more accurate, by disallowing fsmonitor to trigger lstat()
s on the three new files and instead forcing it to rely on the implicit information given by the git add` commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. The issue is that with these changes the fsmonitor data is refreshed when any command reads the index. Left unchanged the git ls-files
would refresh and be using the dirty files from the previous test which would include the newly added files and they would be marked as dirty which would fail this test. Perhaps I can use the test-tool dump-fsmonitor
to get the bitmap to compare against instead of using ls-files
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up question here: what behavior is this test specifically trying to validate?
The comment says:
test that newly added files are marked valid
But based on the change you made it seems like newly added files will not always be marked as valid (i.e. if they were dirty before they will still be considered dirty after the git add
).
Is there there something I'm missing (e.g. was the test .git/hooks/fsmonitor-test
behaving incorrectly, and that's why it needs to be set to empty at the start of the test)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for this test the
write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF
needs to be before the git ls-files
so that the git add
runs with the paths in .git/hooks/fsmonitor-test
but for each git add
the single path for the added file would need to be in .git/hooks/fsmonitor-test
otherwise the last git add
would mark the other paths as dirty and save that index out.
I'm taking a closer look at the tests because before the content of the .git/hooks/fsmonitor-test
did not affect commands that were not refreshing cache entries whereas now any commands that reads the index will get the entries marked dirty that are in the .git/hooks/fsmonitor-test
. This means that even git ls-files
which is being used to validate the CE_FSMONITOR_VALID could have that affected by what is in .git/hooks/fsmonitor-test
. So if left with all the paths in .git/hooks/fsmonitor-test
, those paths will be dirty for every git command that reads the index.
So in most cases we need to make sure .git/hooks/fsmonitor-test
is empty for any validating commands like git ls-files
so that the contents of that file will not change what cache entries are dirty.
I might also try using the test-dump-fsmonitor
since that it only dumping the index entries before applying the bitmap or the .git/hooks/fsmonitor-test
paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or each git add the single path for the added file would need to be in .git/hooks/fsmonitor-test otherwise the last git add would mark the other paths as dirty and save that index out.
Ahh, this is the part I was missing, thanks!
dirty_repo && | ||
git add . && | ||
write_script .git/hooks/fsmonitor-test<<-\EOF && | ||
EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically the change in 'newly added files are marked valid'
made it so that the fsmonitor is not allowed to tell Git to look at any file's stat in all the test cases up until here. Hmm. I would really like to have at least a paragraph in the commit message providing a compelling argument why that is a good thing.
And then I really don't understand why we have to have the full fsmonitor-test
script again, but only for dirty_repo
and for git add .
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, same questions as @dscho on the updates to the tests.
4539b52
to
843cf19
Compare
@kewillford when you get this rebased on top of |
843cf19
to
45814c2
Compare
@kewillford I kicked off a new build for you here. |
3444ec2 ("fsmonitor: don't fill bitmap with entries to be removed", 2019-10-11) added a handful of sanity checks that make sure that a bit position in fsmonitor bitmap does not go beyond the end of the index. As each bit in the bitmap corresponds to a path in the index, this is the right check most of the time. Except for the case when we are in the split-index mode and looking at a delta index that is to be overlayed on the base index but before the base index has actually been merged in, namely in read_ and write_fsmonitor_extension(). In these codepaths, the entries in the split/delta index is typically a small subset of the entire set of paths (otherwise why would we be using split-index?), so the bitmap used by the fsmonitor is almost always larger than the number of entries in the partial index, and the incorrect comparison would trigger the BUG(). Signed-off-by: Kevin Willford <[email protected]>
When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when wanting to know if the entry has been updated. If the flag is set the entry should be considered up to date and the same as if the CE_UPTODATE is set. In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to be refreshed when the fsmonitor bitmap is applied to the index in tweak_fsmonitor. Since the fsmonitor data is kept up to date for every command, some tests needed to be updated to take that into account. istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since that is being called in tweak_fsmonitor. refresh_fsmonitor will only be called once and any other callers should be setting it when refreshing the fsmonitor data so that code can use the fsmonitor data when checking untracked files. When writing the index, fsmonitor_last_update is used to determine if the fsmonitor bitmap should be created and the extension data written to the index. When running through unpack-trees this is not copied to the result index. This makes the next time a git command is ran do all the work of lstating all files to determine what is clean since all entries in the index are marked as dirty since there wasn't any fsmonitor data saved in the index extension. Copying the fsmonitor_last_update to the result index will cause the extension data for fsmonitor to be in the index for the next git command to use. Signed-off-by: Kevin Willford <[email protected]>
The fsmonitor script that can be used for running all the git tests using watchman was causing some of the tests to fail because it wrote to stderr and created some files for debugging purposes. Add a new debug script to use with debugging and modify the other script to remove the code that would cause tests to fail. Signed-off-by: Kevin Willford <[email protected]>
45814c2
to
87c1478
Compare
This change does two main things.
CE_FSMONITOR_VALID
flag in thece_uptodate
macro so that whenever the code is checking if any entry is up to date the fsmonitor flag will be taken into consideration.