Skip to content

Default to histogram diff #23257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Maxattax97
Copy link

This resolves, see that ticket for further details #23255

Comment on lines +245 to +250
if CheckGitVersionAtLeast("1.8.2") == nil {
// The `histogram` algorithm itself was added in 1.7.7, but `diff.algorithm` was supported in Git 1.8.2.
if err := configSet("diff.algorithm", "histogram"); err != nil {
return err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if CheckGitVersionAtLeast("1.8.2") == nil {
// The `histogram` algorithm itself was added in 1.7.7, but `diff.algorithm` was supported in Git 1.8.2.
if err := configSet("diff.algorithm", "histogram"); err != nil {
return err
}
}
if err := configSet("diff.algorithm", "histogram"); err != nil {
return err
}

The minimal required git version is already 2.0

However, I can't LGTM yet as I don't know what the side effects are.
Does it need more memory?
More CPU time?
Produce worse outputs?
Or why isn't it the default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much of these questions are answered in this post: https://stackoverflow.com/questions/32365271/whats-the-difference-between-git-diff-patience-and-git-diff-histogram

Does it need more memory?

There has been a short history of bad memory usage that has since been fixed:

Note: "git diff --histogram" had a bad memory usage pattern, which has been rearranged to reduce the peak usage, with Git 2.19 (Q3 2018).

Note: the patience and histogram algorithms had memory leaks, fixed with Git 2.36 (Q2 2022)

I can bump the version requirement if that's desirable.

More CPU time?

It appears to be faster than both Myers and Patience.

Commit 8c912ee actually introduced --histogram to diff:
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show that it is faster than its --patience cousin, as well as the default Meyers algorithm.

8c912ee (teach --histogram to diff, 2011-07-12) claimed histogram diff was faster than both Myers and patience.

Produce worse outputs?

Based on the research article I posted in the ticket, it consistently produces better outputs than Myers. I'm broadly summarizing, but this is measured by finding over x2 more bugs in aggregate in the research.

Why isn't it the default?

... uhh, idk. It's newer? 🤷‍♂️

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 2, 2023
@silverwind
Copy link
Member

Can we add a test for this that confirms the diff algorithm was switched? It would be valuable.

@techknowlogick techknowlogick added this to the 1.20.0 milestone Mar 2, 2023
@adlternative
Copy link

I think many times users expect the diff on the server to match the diff on the client.
The current default algorithm for git is Myers, so just let gitea server keep it.

And I think a better approach for this requirement may be to add configuration options
for the diff algorithm to the server.

e.g.

[diff]
ALGORITHM=histogram

@silverwind
Copy link
Member

silverwind commented Mar 9, 2023

Maybe we should just make this a docs-only change where we show how the server's gitconfig can be modified (I recommend doing it in user config $RUN_USER/.gitconfig, but editing the system config (/etc/gitconfig) also works.

I think under some circumstances, gitea may write to $RUN_USER/.gitconfig, so system config might be safer. Docker setups may need a change to allow mounting this file into the container.

@adlternative
Copy link

Maybe we should just make this a docs-only change where we show how the server's gitconfig can be modified (I recommend doing it in user config $RUN_USER/.gitconfig, but editing the system config (/etc/gitconfig) also works.

I think under some circumstances, gitea may write to $RUN_USER/.gitconfig, so system config might be safer. Docker setups may need a change to allow mounting this file into the container.

It is of course best if you can configure .gitconfig directly :)

@silverwind
Copy link
Member

silverwind commented Mar 9, 2023

I'd be willing to approve this if there is a way for users to alter the algorithm we set. We could introduce a config option git.DIFF_ALGORITHM for this. This option should also accept a value default which means we let git's default take effect.

@wxiaoguang
Copy link
Contributor

Instead of adding more Gitea options, I think we can have a section like [git.config] , then Gitea could set these options to git config, then users could fine tune any option they like.

eg:

# in Gitea's app.ini
[git.config]
diff.algorithm = histogram

@silverwind
Copy link
Member

silverwind commented May 22, 2023

Could also make it docs-only where we suggest common gitconfig adjustments on the server. More of them that I currently set:

[receive]
  advertisePushOptions = true
  certNonceSeed = "randomstring"
[diff]
  algorithm = histogram

advertisePushOptions and certNonceSeed are for #13454, so if we document these, that issue will also be solved.

@silverwind
Copy link
Member

silverwind commented May 22, 2023

Thought I would not mind a gitea config section for it. It might make it easier in case of docker and such where one would have to mount files like /etc/gitconfig or ~/.gitconfig into the container otherwise.

@wxiaoguang wxiaoguang added the issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail label May 22, 2023
@wxiaoguang
Copy link
Contributor

-> Support user customized git config, use diff.algorithm=histogram by default #24860

@wxiaoguang
Copy link
Contributor

Thank you very much for your PR, now #24860 provides a more general approach for git config system, and now Gitea has diff.algorithm=histogram by default.

@wxiaoguang wxiaoguang closed this May 23, 2023
@GiteaBot GiteaBot removed this from the 1.20.0 milestone May 23, 2023
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Aug 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants