Skip to content

Suggest update the branch when behind too many commits #1942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2025

Conversation

xizheyin
Copy link
Contributor

@xizheyin xizheyin commented Apr 17, 2025

As discussion in #t-compiler/rustc-dev-guide > A reminder in the doc to try to work with `rust-lang/rust`, I add this feature. But I haven't test it (That may takes some time). Do I need to test it locally?

cc @Kobzol

@rustbot
Copy link
Collaborator

rustbot commented Apr 17, 2025

Error: Invalid triagebot.toml at position 14:2:

TOML parse error at line 14, column 2
   |
14 | [pr-behind-commits]
   |  ^^^^^^^^^^^^^^^^^
unknown field `pr-behind-commits`, expected one of `relabel`, `assign`, `ping`, `nominate`, `prioritize`, `major-change`, `glacier`, `close`, `autolabel`, `notify-zulip`, `github-releases`, `review-submitted`, `review-requested`, `shortcut`, `note`, `mentions`, `no-merges`, `validate-config`, `pr-tracking`, `transfer`, `merge-conflicts`, `bot-pull-requests`, `rendered-link`, `canonicalize-issue-links`, `issue-links`, `no-mentions`

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@xizheyin xizheyin force-pushed the pr-behind-commits branch from a100080 to 5e7ef06 Compare April 17, 2025 08:18
@rustbot
Copy link
Collaborator

rustbot commented Apr 17, 2025

Error: Invalid triagebot.toml at position 14:2:

TOML parse error at line 14, column 2
   |
14 | [pr-behind-commits]
   |  ^^^^^^^^^^^^^^^^^
unknown field `pr-behind-commits`, expected one of `relabel`, `assign`, `ping`, `nominate`, `prioritize`, `major-change`, `glacier`, `close`, `autolabel`, `notify-zulip`, `github-releases`, `review-submitted`, `review-requested`, `shortcut`, `note`, `mentions`, `no-merges`, `validate-config`, `pr-tracking`, `transfer`, `merge-conflicts`, `bot-pull-requests`, `rendered-link`, `canonicalize-issue-links`, `issue-links`, `no-mentions`

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR! We should definitely test this, but I can do it on my test repo.

Since this check essentially checks commits, and should be ideally done after each push (unless we find the API call to be too expensive), could you please move it to the check_commits handler?

It would be nice to introduce some [check_commits] config option that would contain the individual commit checks as attributes, and migrate the existing commit checks to it (CC @Urgau, WDYT?), but that doesn't need to happen in this PR.

I wonder how this interacts with merge commits, as found e.g. in rust-lang/rust. We should check if behind_by only includes merge commits or all commits.

@xizheyin
Copy link
Contributor Author

Yeah, thanks for guide, I'll move it to check_commits.

I'm calling github's compare api here, which should take into account all of the commits, is there anything special about merge commits? Does it need to be considered separately?

@Kobzol
Copy link
Contributor

Kobzol commented Apr 17, 2025

Well, the history isn't linear, so "number of commits between A and B" is not something that can be answered unambigously.

@xizheyin xizheyin force-pushed the pr-behind-commits branch from e49a5a1 to 85c7114 Compare April 17, 2025 09:45
@xizheyin
Copy link
Contributor Author

xizheyin commented Apr 17, 2025

Oh, you mean for example Auto merge commit will contain multiple PR commits? Considering only github diffs might be a tradeoff. We can control the approximate range by setting the threshold. I'm concerned that a more granular analysis might take up too much compute resources.

@xizheyin xizheyin force-pushed the pr-behind-commits branch 3 times, most recently from 7121324 to 28e4c04 Compare April 17, 2025 15:00
@xizheyin
Copy link
Contributor Author

In my latest change, I eliminated Auto-merge vs Rollup-merge commits, which should be more fair.

For the part of getting the comparing commits, I didn't take paging because I think a page can query up to 250 leading commits, which should be much higher than our threshold, and to save on queries, I only request it once.
@Kobzol

@Urgau
Copy link
Member

Urgau commented Apr 17, 2025

It would be nice to introduce some [check_commits] config option that would contain the individual commit checks as attributes, and migrate the existing commit checks to it

Some of them have configs for them-selves, like [no_merges], won't it be a bit awkward to configure them if they were also under a single [check_commits] config? We could prefix the individual config (like no_merges_labels and so on), but that since weird and unnecessary.

@Kobzol
Copy link
Contributor

Kobzol commented Apr 17, 2025

Ideally I would like to merge all of these checks under a single config option, to have it be more consistent, but it may not be worth the churn. We could also support both configs and incrementally migrate the repos to the new form, but again, it's probably not worth it.

@xizheyin
Copy link
Contributor Author

I tried it yesterday and it seems like there was a slight problem with the macro expansion, I think this change may require a separate PR. I would like to do it if possible.

@Kobzol
Copy link
Contributor

Kobzol commented Apr 18, 2025

Just to clarify on my earlier comments, we should only take "root" merge commits into account, so essentially merged PRs. A single PR can contain hundreds of commits (for example sync PRs from subtrees), so if we had a limit of 100 commits, it could trigger after a single PR, which is no good.

So really we want to have a notion of "you are N PRs behind". That being said, each merged PR is a master commit with two parents, the first one is the previous merge and the right one is the tip of the merged PR. I would kind of expect that maybe behind_by only "traverses" the first parents? That's what I wanted to check.

@xizheyin
Copy link
Contributor Author

Thank you, I got it. We need to count merge PRs and not just commits, I'll revise it.

@Kobzol
Copy link
Contributor

Kobzol commented Apr 18, 2025

Brainstorming:

I did some experiments and it seems that the behind_by attribute is mostly useless to us, as it seems to count all commits. What we could do instead is to run gh api "repos/rust-lang/rust/[email protected]" to go through all recent merge commits, and try to see if the PR's base SHA is in them and on which position. But the base SHA is actually not the right thing that we need (see below).

I checked if we can get something from the PR details (which we can either fetch, but they should be actually sent proactively in the PR status change webhook). It contains an attribute base.sha, which tells us the SHA of the master commit on which the PR is based. But this SHA is actually reset (to latest master) on each push to the PR, so that doesn't really help us.

So, another idea: we can take the SHA of the PR, and ask for the details of that commit (gh api repos/kobzol/rust/commits/340272e54ed731ed827f991d75afc9db8aa21c50), which has a parents field. So we could look up the first parent of the PR's SHA, and ask GH again, and see what is the date of that parent commit, and if it's too old, then we can warn.

But when I thought of this solution, I realized that we might not even need the parent commit at all, if we only take a look at the date? When PR HEAD commit date is older than N days, we know that it hasn't been rebased onto the latest master at the user's PC. If they do rebase, the author date will remain the same, but the commit date will be updated. Hmm, no, that wouldn't necessarily work, because even if you rebase, you might still rebase onto an old local master, so even though the commit date will be new, it doesn't mean that you have rebased onto a fresh master commit.

So perhaps the approach with checking the HEAD commit and its parent, and then warn if the parent is older than N days might be the way to go.

@Urgau
Copy link
Member

Urgau commented Apr 18, 2025

So perhaps the approach with checking the HEAD commit and its parent, and then warn if the parent is older than N days might be the way to go.

Seems reasonable to me.

@xizheyin
Copy link
Contributor Author

xizheyin commented Apr 19, 2025

I updated the code. The current implementation uses a two-step approach to determine if a PR needs updating:

  1. Time-based Check (Preferred Method): Calls is_parent_commit_too_old to check if the PR's parent commit is too old (default: older than 14 days). If the parent commit is too old, immediately generates a warning message suggesting to update the branch. This is considered a more accurate method and is used first

  2. Important Commits Check (Fallback Method): If the time check passes or errors, proceeds with the commit-based check. It gets the difference information between the PR and target branch(by github Compare API). We only count Auto-merge and Rollup-merge commits (important integration commits). If the number of missing important commits exceeds the threshold (default 100), generates a warning message. This method focuses on whether the PR is missing important integration changes, ignoring regular commits.

Since there may be a collection of PRs in the rollup merge, the number we count will only be a little less than the actual PRs, so as to avoid disturbing the users. This is not a bad alternative.

This approach considers both the timeliness of the PR (time perspective) and whether the PR includes the latest important merge commits (functional perspective), providing a comprehensive assessment of whether the PR needs updating.

This is the implementation as I understand it, I don't know if I'm wrong.

@Kobzol
Copy link
Contributor

Kobzol commented Apr 22, 2025

Checking of the commit date looks good. I would just use that, the fallback would be problematic, because some subtree synces can contain fake/empty bors merge commits, and a single PR can contain tens of these. Detecting this based on number of commits simply won't work well in rust-lang/rust, IMO.

@xizheyin
Copy link
Contributor Author

Ok, I'll remove the logic related to checking for commits.

@xizheyin xizheyin force-pushed the pr-behind-commits branch 2 times, most recently from 1cf09ce to 90a5451 Compare April 22, 2025 09:07
Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think that the implementation looks good now, modulo a few refactorings that I proposed in comments.

@xizheyin xizheyin force-pushed the pr-behind-commits branch from 7d2fdfd to 9eaa777 Compare April 23, 2025 13:10
@xizheyin
Copy link
Contributor Author

I did some refactor, mainly as followings:

  • rename behind_master to behind_upstream
  • move aging checker to behind_upstream.rs
  • add github_commit in Repository
  • rename commits_in_range to github_commits_in_range(Because it return a GithubCommit instead of GitCommit)
  • some other nits

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Please remove the stray backtick and squash the commits, and we can merge it :)

@xizheyin xizheyin force-pushed the pr-behind-commits branch 2 times, most recently from fd7d71b to dda8a0a Compare April 23, 2025 14:27
@xizheyin
Copy link
Contributor Author

xizheyin commented Apr 23, 2025

Ok, I finished it. But you may need to test it, I don't have a test environment. :)

@Urgau
Copy link
Member

Urgau commented Apr 23, 2025

Please wait before merging, I like to take a look. Will look at it tonight.

@xizheyin xizheyin force-pushed the pr-behind-commits branch from dda8a0a to 55b8b93 Compare April 24, 2025 04:21
@xizheyin
Copy link
Contributor Author

Thanks both of you!

I realized that the previous code was wrong.

We shouldn't be checking the parent of issue.head, because the head seems to be the latest commit, and its parent is probably the last commit we submitted (if any) instead of the base commit.

The array returned by pulls/xxx/commits is accessed in order from oldest to newest, so commits.first() gets the earliest commit in the current PR, and its parent is what we need.

I've changed it now though. By the way, I think it's possible to do comments on some structures, e.g. some repository field can be annotated whether it's an upstream repository or a forked repository, since the url returned by the github api is quite diverse.

@xizheyin xizheyin force-pushed the pr-behind-commits branch from 55b8b93 to 42c47d5 Compare April 24, 2025 04:38
@Kobzol
Copy link
Contributor

Kobzol commented Apr 24, 2025

Good point, I kind of forgot that PRs can have more than one commit, lol. Thank you!

@Kobzol Kobzol added this pull request to the merge queue Apr 24, 2025
Merged via the queue into rust-lang:master with commit 527bf21 Apr 24, 2025
3 checks passed
@xizheyin xizheyin deleted the pr-behind-commits branch April 24, 2025 10:24
@xizheyin
Copy link
Contributor Author

This functionility seems to work well. 😁Do we need to extend it to other code repositories?
rust-lang/rustc-dev-guide#2384 (comment)

@Urgau
Copy link
Member

Urgau commented May 15, 2025

The original motivation was to do it for rust-lang/rust, so I would fine if we enable it there.

Feel free to open a PR and assign it to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants