-
-
Notifications
You must be signed in to change notification settings - Fork 345
gix
fails to decode a tree object that git
has no trouble with
#1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note, this feels similar to the problem I reported in #950. |
I have seen this sort of tree parsing error arise with |
Thanks a lot for this fantastic report! I see how much time you must have spent not only to compile it, but also to debug the issue at hand down to this point. I also see how there is two issues here, one regarding For the first of the two, a test was always missing, presumably, so it was missed during the I will definitely keep you posted about the progress on this matter. |
Fix the slow and the fast-path tree-parsers to be able to cope with a greater variety of trees.
Could you also elaborate where I am asking because doing so allocates a vector, and maybe the |
Fix the slow and the fast-path tree-parsers to be able to cope with a greater variety of trees.
Fix the slow and the fast-path tree-parsers to be able to cope with a greater variety of trees.
The fix is now available in the linked PR, and it should land in It would be great if you could validate it's working for all repositories now, and if not I am happy for more samples that won't parse. Thank you. |
The specific call is in here. The code iterates over the object IDs in the ODB, reads each header, and switches on object type. In the tree-handling case, What's this Some background: Nosey Parker detects secrets in Git repos by scanning each blob. Since its v0.14.0 release in August, it now additionally investigates commit and tree objects within Git repositories in order to calculate detailed metadata for each blob it scans. This metadata for each blob includes the commit that first introduced it and the pathname it appeared with. It uses a novel graph algorithm (?) to efficiently determine this metadata for all blobs, many orders of magnitude faster than you can with regular The big idea in the algorithm is to build an in-memory representation of the inverted Git commit graph (allowing quick lookup of child commits), then traverse that in topological order, determining for each commit the set of blobs that were first introduced there, with respect to all other commits in the repository. The implementation uses an adaptation of Kahn’s algorithm for topological traversal, combined with a priority queue ordered by commit node out-degree. The priority queue approach heuristically minimizes memory use by keeping the work list of nodes that still need to be visited small. Additionally, the implementation uses data representations that provide good CPU cache behavior and constant factors, such as compressed bitmaps to keep track of which blobs have been seen at each commit. |
@Byron thank you for the very fast response and patch! What was the root of the problem? Was that tree object badly formed somehow? I've tested Nosey Parker with the From https://github.com/xilense/Home-AssistantConfig:
From https://github.com/akram/openclassifieds2:
|
In this line it shows that ultimately the code wants to iterate entries, so there is no need to allocate a Instead, Thanks so much for sharing the algorithmic approach that's implemented in NoseyParker, I am loving it! Even though it's way above my head, it gives me hope that some of these algorithms can also be used to speed up
I don't know actually. There were two parser implementations, one with Thus I am curious what these other failing trees look like - I will let you know once I know more. |
I did the fix properly this time and fixed another shortcoming that would have bitten some other time, I am sure. With that debt removed, I'd hope these kinds of failures are nothing you will see ever again. |
I have updated a local development copy of Nosey Parker to use the latest Anyway, I ran over my 7500 input repos again, and this time saw no tree parsing errors. Thank you @Byron! I'll get those changes on the next |
I am glad to hear that!
Yes, those really had to happen to remove the hopefully last 'wart' from one of the earliest crates in the family. These sometimes have terrible shortcuts in them as back then, my own quality standards for what was a pet-project were much lower. |
I've also switched to using |
Absolutely! If you ever run into a profile that indicates something in |
Current behavior 😯
There are certain Git repositories that contain tree objects that
gix
fails to decode, but whichgit
itself has no trouble with.For example, from a test program included in this issue:
Additionally, it appears that the
verbose-object-parsing-errors
feature ofgix-object
is broken when usinggix
newer than 0.51.0 andgix-object
newer than 0.33. Rather than getting a detailed error message when an object fails to parse, the error message isNone
, or rendered as an empty string. This seems to coincide with switching away fromnom
for object parsing.Expected behavior 🤔
gix
should correctly parse the tree objects.Git behavior
git
seems to have no trouble!Steps to reproduce 🕹
For example, see https://github.com/jsbin/jsbin, which has a few problematic trees:
5e22e05094a0626934cb2f7319e92da2d343e726
7dde39c37eea1cc6f81a04ba50e2ef55333c6966
ad20e1886a324465093f656a0910b9fe00429977
e0d917771d3c3ffb7782158d9f33b2b7e9c6c524
Locally, I cloned the repo to
/disk1/clones/jsbin/jsbin
usinggit clone --mirror
.We can inspect the second problematic tree (
7dde39c37eea1cc6f81a04ba50e2ef55333c6966
) with Git. It is the root tree of this commit: jsbin/jsbin@884c72eGit says its payload is 659 bytes long:
I wrote a small
gix
test program that demonstrates the problem.Cargo.toml:
src/main.rs:
Running this for the second bad tree object noted above, we get this:
The text was updated successfully, but these errors were encountered: