Skip to content

Conversation

@Rocketknight1
Copy link
Member

This draft PR adds a Github action to assign reviewers. It takes a similar approach to the CODEOWNERS file, but it tries to choose one or two most relevant reviewers rather than just pinging absolutely everyone.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1
Copy link
Member Author

Should be ready, pinging @ydshieh and @ArthurZucker for review! (In future, these pings will be automatic 😅 )

@ydshieh
Copy link
Collaborator

ydshieh commented Jan 24, 2025

The pull_request_target will checkout to the base branch of the pull request (which is usually the main).

That is why it is secure by default (and not blocked by the need approval configuration).

So here we need to find the actual PR commit, checkout to it, but keep the job/steps secure when doing this.

Ref

It's kind involved. If you prefer, I can work on this workflow part next week.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully trust @ydshieh on this one for the safety part!

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 4, 2025

Guess I can take over this PR unless @Rocketknight1 has any other thought. Will do it this week.

@Rocketknight1
Copy link
Member Author

I'm sorry, I just got distracted by other stuff! The PR should be ready, though - if you want to merge it or take over, go ahead @ydshieh

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 5, 2025

@Rocketknight1 @ArthurZucker

I took a look, and @Rocketknight1 is using pull_request_target event, without checkout to the PR head branch, so the action is running against the main branch.

Furthermore, the script .github/scripts/assign_reviewers.py, it gets the touched files without checkout to the PR head branch neither: it uses PyGithub and avoids the checkout.

Therefore, my previous concern regarding security is not presenting. Great job @Rocketknight1 !

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 5, 2025

I will review this PR in a more detailed manner later.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🤗

@@ -0,0 +1,71 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing licence

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and potentially doc helper

@Rocketknight1
Copy link
Member Author

cc @ydshieh I'm seeing typing failures that I think your PR already fixed - do you know why the CI is still running the old verison?

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 13, 2025

maybe you haven't rebase / merge the latest main branch ..?

@Rocketknight1
Copy link
Member Author

It's rebased!

@Rocketknight1
Copy link
Member Author

A second rebase fixed it - maybe I just needed to clear a cache @ydshieh! Sorry to bother you.

Should be ready for final review now though, and licence is added @ArthurZucker !

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me (especially no security concerns I can think of), only a few nit questions.

Great 💯

print(f"Reviewers already requested: {users_requested}")
return

locs_per_owner = Counter()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is locs ??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I somehow missed this until now! LOCs = lines of code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i am just not familiar with common abbreviation here. But if you think it would help in general / in the future, let's add a comment about the meaning.

locs_per_owner[owner] += file.changes

# Assign the top 2 based on locs changed as reviewers
top_owners = locs_per_owner.most_common(3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 here seems a bit strange

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for 3 is that the author might be included as well. Therefore, we take the top 3 in order, discard the author if present, and if we still have 3 then we discard the lowest value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good comment to be included :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I cleaned it up to be a bit more obvious!


# More high-level globs catch cases when specific rules later don't apply
/src/transformers/models/*/*processing* @molbap @yonigozlan @qubvel
/src/transformers/models/*/image_processing* @qubvel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess @qubvel will be ping for all /src/transformers/models/*/image_processing_*_fast* too? Not sure if he will be super happy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point! This needs a rewrite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@Rocketknight1 Rocketknight1 force-pushed the auto-assign-reviewers branch from 6a1bbb6 to dc39a55 Compare March 4, 2025 14:38
@machinelearningprodigy
Copy link

The auto-assign action picks 2-3 reviewers based on file changes. How exactly does it rank them? Should we refine the logic to avoid unnecessary pings (e.g., skipping PR authors automatically)?

@Rocketknight1
Copy link
Member Author

@machinelearningprodigy They're ranked by how many lines of code touch files that are "owned" by that person, where ownership is defined in the codeowners_for_review_action file.

@Rocketknight1 Rocketknight1 force-pushed the auto-assign-reviewers branch from 9a860bf to ccb865b Compare March 7, 2025 12:11
@Rocketknight1 Rocketknight1 merged commit f2e197c into main Mar 7, 2025
11 checks passed
@Rocketknight1 Rocketknight1 deleted the auto-assign-reviewers branch March 7, 2025 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants