Skip to content

midx: verify: add midx packfiles to the packed_git list #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jeffhostetler
Copy link

Fix "git multi-pack-index verify" to handle repos with thousands
of packfiles.

Midx verify adds the individual "packed_git" structures to the
multi_pack_index.packs array, but it does not add them to the
"repository.objects.packed_git" list. During the verification
code, each packfile is opened and scanned. And "pack_open_fds"
is incremented. If "pack_open_fds" equals the "pack_max_fds"
open_packed_git_1() calls close_one_pack() to LRU-style close
an already open packfile. But because the packfiles were never
added to the "packed_git" list, close_one_pack() does nothing.
If there are very many packfiles, Git runs out of file descriptors
and fails.

Note that this was observed on Windows when build with GCC and
in a repository with more than (2048-25) packfiles.

Signed-off-by: Jeff Hostetler [email protected]

Fix "git multi-pack-index verify" to handle repos with thousands
of packfiles.

Midx verify adds the individual "packed_git" structures to the
multi_pack_index.packs array, but it does not add them to the
"repository.objects.packed_git" list.  During the verification
code, each packfile is opened and scanned.  And "pack_open_fds"
is incremented.  If "pack_open_fds" equals the "pack_max_fds"
open_packed_git_1() calls close_one_pack() to LRU-style close
an already open packfile.  But because the packfiles were never
added to the "packed_git" list, close_one_pack() does nothing.
If there are very many packfiles, Git runs out of file descriptors
and fails.

Note that this was observed on Windows when build with GCC and
in a repository with more than (2048-25) packfiles.

Signed-off-by: Jeff Hostetler <[email protected]>
@jeffhostetler jeffhostetler merged commit be59e6f into microsoft:vfs-2.20.1 Feb 27, 2019
dscho pushed a commit that referenced this pull request Feb 28, 2019
midx: verify: add midx packfiles to the packed_git list
@derrickstolee
Copy link

@jeffhostetler sorry that this was a pain point, and that I didn't see it until you posted the patches to the mailing list.

This makes me think there is a bigger issue with the midx during a normal read: if the user ran git log --oneline >/dev/null, then they would probably open all of their packfiles while reading the commits. We purposefully do not add the packs to the packed_git list to avoid duplicate pack reads on object misses. My gut feeling is that this file-descriptor limit can happen during normal read operations. Running git commit-graph write may hit this, too, when no commit-graph exists.

Likely the full solution requires these steps:

  1. Call install_packed_git() in prepare_midx_pack(). This fixes the bug, but leads to performance regressions.
  2. Add an unsigned multi_pack_index : 1 flag in struct packed_git and set it to 1 in prepare_midx_pack().
  3. Ignore packs with the multi_pack_index flag while looping through packed_git in these methods:
    ** approximate_object_count()
    ** find_pack_entry()
    ** find_short_packed_object()
  4. Remove all references to the all_packs and rewrite get_all_packs() to simply ensure that all midx packs are added to the packed_git struct and return packed_git instead.

That sounds like a lot of work, and I'm sorry for my oversight on this file handle limit.

@jeffhostetler
Copy link
Author

@derrickstolee Thanks for the info. And NP.

I think all that makes sense. I had tried several different ways to get the midx packfiles properly
accounted for in the pack_open_fds computation (and getting a nicely circular list at one point
because of the way all_packs is built from the 2 distinct sets of packfiles).

The 2 line fix here helped untangle that, but yes getting rid of the 2 distinctions and having a
single list would make things easier to understand.

The fix here only affects midx verify and seemed safe for the hotfix. My next round of changes
in #124 sorts the object list by packfile and lets us do the
verify with only 1 packfile open at a time. This effectively eliminates the need for the fix in this
PR.

I think I'd like to drop the contents of this PR from the upstream PR
gitgitgadget#166 and (for now) just upstream the packfile
sort and address the all_packs in a later series.

Thanks!

@derrickstolee
Copy link

@jeffhostetler you are absolutely right to do this fix first, as it is definitely causing issues. I mention a larger fix to take care of all possible issues, but we may not have hit them in the wild (although, git commit-graph verify may hit this, too).

Another way to try and fix this is to add the PackfileMaintenanceStep back to the Git maintenance scheduler to reduce the number of packs.

@jeffhostetler jeffhostetler deleted the hotfix-midx-verify branch July 1, 2019 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants