-
Notifications
You must be signed in to change notification settings - Fork 534
Tree iteration performance #636
Comments
After running pprof, it seems we waste time (not only there) in ReadByte in teeReader struct from Before :
After :
Results depends on the computer load but I get something between 2 or 3 seconds faster, I did it with 10942 files. Another point where the program seems to stuck is when we have file a bit heavy, I mean something around the megabyte, it takes a bit of time to process. We I run git I get a result in less than a second so I guess there are many path to improve that. |
I wanted to report unusably slow performance on calculation of This is probably the same root cause. I took a profile but nothing stood out to me as obviously the thing to do to fix the situation. Here's the SVG CPU profile in case it's helpful This is the code I'm using to get the list of files in a commit. I saw the code in #562 and will try using that to get this information instead. // commitFilenames returns the filenames contained by the given commit.
func commitFilenames(commit, parent *object.Commit) []string {
patch, err := commit.Patch(parent)
glog.FatalIf(err)
var filenames []string
for _, filePatch := range patch.FilePatches() {
_, to := filePatch.Files()
if to != nil {
filenames = append(filenames, to.Path())
}
// Ignore file deletions
}
return filenames
} Thanks for all your work on this great project! |
@tajtiattila if you need only filenames and hashes you better use tree, err := commit.Tree()
if err != nil {
// handle error
}
seen := make(map[plumbing.Hash]bool)
iter := object.NewTreeWalker(tree, true, seen)
for err == nil {
name, entry, err = iter.Next()
fmt.Println(name, entry.Hash)
}
if err == io.EOF {
err = nil
} it will work much much faster because Still slower than git though:
|
@smacker Thank you. The docs func (c *Commit) Files() (*FileIter, error) suggest FileIter is just another way to get the tree, so I haven't even considered TreeIter. I did not know FileIter is inherently slower than TreeIter. Does it have to be? |
I'm trying to iterate a tree and list filenames and their hashes within a repo, using https://gist.github.com/tajtiattila/f7357e2536c5ae5b411607ebfa71d5e9
However the performance seems to be at least two orders of magnitude slower than that of Git. Is there a better way to do it?
My intent was to use go-git to check changed files in a custom pre-commit hook for git.
OS is Windows 10. Repo is on an SSD and has large with binary files in it.
The text was updated successfully, but these errors were encountered: