-
Notifications
You must be signed in to change notification settings - Fork 2.2k
scripts: add cherry-pick verification tool with fuzzy matching #10034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @bhandras, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new command-line tool to automate the verification of cherry-picked commits during release processes. Its primary purpose is to ensure that commits brought from a source branch into a release branch are identical or very similar, helping to maintain release quality and identify unintended modifications or discrepancies efficiently.
Highlights
- New Cherry-Pick Verification Tool: Introduces a new bash script,
scripts/fuzzy-match-release-branch.sh
, designed to verify the integrity and presence of cherry-picked commits between a source branch (e.g.,master
) and a release branch. - Dual-Phase Matching Logic: The script employs a two-step verification process: it first attempts an exact match using normalized patch hashes. If no exact match is found, it falls back to a fuzzy matching algorithm that filters source commits by author and subject, then compares normalized diffs to find the closest match based on line difference count.
- Configurable Performance and Usability: The tool supports command-line arguments to specify the source and release branches, as well as limits for the number of commits to scan in the source branch and compare in the release branch, enhancing performance for large repositories. It provides clear, detailed output for matched and unmatched commits, including suggestions for manual
git diff
for fuzzy matches.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a very useful script for verifying cherry-picks during a release process. The implementation is solid, using both exact and fuzzy matching. I've provided feedback on a few critical correctness and high-priority performance issues. Specifically, the argument parsing and fuzzy-match scoring have bugs that need to be addressed. Additionally, there are several opportunities to optimize performance by reducing the number of git
calls within loops. Addressing these points will make the script more robust and efficient.
while [[ $# -gt 0 ]]; do | ||
case "$1" in | ||
--source) SRC_BRANCH="$2"; shift 2 ;; | ||
--release) RELEASE_BRANCH="$2"; shift 2 ;; | ||
--scan-limit) SRC_SCAN_LIMIT="$2"; shift 2 ;; | ||
--limit) RELEASE_LIMIT="$2"; shift 2 ;; | ||
-h|--help) show_help ;; | ||
*) echo "Unknown argument: $1"; show_help ;; | ||
esac | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current argument parsing logic is vulnerable to errors when an option is passed without a corresponding value. For example, if the script is run as ./script.sh --source --limit 10
, the string "--limit"
will be incorrectly assigned to SRC_BRANCH
.
To make the parsing more robust, you should validate that a value is provided for each option and that the value is not another option flag.
while [[ $# -gt 0 ]]; do | |
case "$1" in | |
--source) SRC_BRANCH="$2"; shift 2 ;; | |
--release) RELEASE_BRANCH="$2"; shift 2 ;; | |
--scan-limit) SRC_SCAN_LIMIT="$2"; shift 2 ;; | |
--limit) RELEASE_LIMIT="$2"; shift 2 ;; | |
-h|--help) show_help ;; | |
*) echo "Unknown argument: $1"; show_help ;; | |
esac | |
done | |
while [[ $# -gt 0 ]]; do | |
case "$1" in | |
--source|--release|--scan-limit|--limit) | |
if [[ -z "$2" || "$2" =~ ^- ]]; then | |
echo "Error: Missing value for argument $1" >&2 | |
show_help | |
fi | |
case "$1" in | |
--source) SRC_BRANCH="$2" ;; | |
--release) RELEASE_BRANCH="$2" ;; | |
--scan-limit) SRC_SCAN_LIMIT="$2" ;; | |
--limit) RELEASE_LIMIT="$2" ;; | |
esac | |
shift 2 | |
;; | |
-h|--help) show_help ;; | |
*) echo "Unknown argument: $1"; show_help ;; | |
esac | |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, replaced!
for j in "${!SRC_COMMIT_META[@]}"; do | ||
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then | ||
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true) | ||
score=$(echo "$diff" | grep '^[-+]' | wc -l | xargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current method for calculating the diff score incorrectly includes diff header lines (e.g., --- a/file.txt
and +++ b/file.txt
) in its count because they also start with +
or -
. This inflates the score and can lead to an inaccurate fuzzy match.
The calculation should be adjusted to count only the lines representing actual content changes.
score=$(echo "$diff" | grep '^[-+]' | wc -l | xargs) | |
score=$(echo "$diff" | grep -vE '^(--- |\+\+\+ )' | grep -c '^[-+]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit") | ||
subject=$(git log -1 --pretty=format:"%s" "$commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script makes two separate git log
calls for each source commit to get the author and subject. This is inefficient and can significantly slow down the script when scanning a large number of commits.
These calls can be combined into a single git log
command using a custom format string with a null character separator for safe parsing. This optimization should also be applied to lines 95-96 where release commit info is fetched.
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit") | |
subject=$(git log -1 --pretty=format:"%s" "$commit") | |
IFS=$'\0' read -r author subject < <(git log -1 --pretty=format:"%an <%ae>%x00%s" "$commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broke the script, kept the original solution.
rc_author=$(git log -1 --pretty=format:"%an <%ae>" "$rc_commit") | ||
rc_subject=$(git log -1 --pretty=format:"%s" "$rc_commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the source commit indexing loop, two separate git log
calls are made here for each release commit. This is inefficient.
Combining these into a single git log
call will improve performance.
rc_author=$(git log -1 --pretty=format:"%an <%ae>" "$rc_commit") | |
rc_subject=$(git log -1 --pretty=format:"%s" "$rc_commit") | |
IFS=$'\0' read -r rc_author rc_subject < <(git log -1 --pretty=format:"%an <%ae>%x00%s" "$rc_commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broke the script, kept the original solution. Efficiency is not the key here.
found_exact="" | ||
for j in "${!SRC_PATCH_HASHES[@]}"; do | ||
if [[ "${SRC_PATCH_HASHES[$j]}" == "$rc_patch_hash" ]]; then | ||
found_exact="${SRC_COMMITS_ARRAY[$j]}" | ||
break | ||
fi | ||
done | ||
|
||
if [[ -n "$found_exact" ]]; then | ||
src_subject=$(git log -1 --pretty=format:"%s" "$found_exact") | ||
src_author=$(git log -1 --pretty=format:"%an <%ae>" "$found_exact") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an exact patch match is found, the script makes new git log
calls to get the source commit's subject and author. This is redundant and inefficient because this information was already collected and stored in SRC_COMMIT_META
during the indexing phase.
You can improve performance by refactoring this section to retrieve the metadata from the pre-populated arrays instead of calling git
again.
found_exact="" | |
for j in "${!SRC_PATCH_HASHES[@]}"; do | |
if [[ "${SRC_PATCH_HASHES[$j]}" == "$rc_patch_hash" ]]; then | |
found_exact="${SRC_COMMITS_ARRAY[$j]}" | |
break | |
fi | |
done | |
if [[ -n "$found_exact" ]]; then | |
src_subject=$(git log -1 --pretty=format:"%s" "$found_exact") | |
src_author=$(git log -1 --pretty=format:"%an <%ae>" "$found_exact") | |
found_exact_index=-1 | |
for j in "${!SRC_PATCH_HASHES[@]}"; do | |
if [[ "${SRC_PATCH_HASHES[$j]}" == "$rc_patch_hash" ]]; then | |
found_exact_index=$j | |
break | |
fi | |
done | |
if [[ $found_exact_index -ne -1 ]]; then | |
found_exact="${SRC_COMMITS_ARRAY[$found_exact_index]}" | |
meta_info="${SRC_COMMIT_META[$found_exact_index]}" | |
src_subject="${meta_info%__*}" | |
src_author="${meta_info#*__}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool idea for a script!
Have a couple of ideas on how to improve it a bit.
if [[ "$RELEASE_LIMIT" -gt 0 ]]; then | ||
RELEASE_COMMITS=$(echo "$RELEASE_COMMITS" | head -n "$RELEASE_LIMIT") | ||
fi | ||
RELEASE_COMMITS=$(echo "$RELEASE_COMMITS" | tail -r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -r
option seems to be BSD specific, it doesn't exist on Linux. Also, doesn't tail by default take the last 10 lines only? Which would mean we only look at the last 10 commits?
Use tac
instead, which reverses the order line by line as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with the portable RELEASE_COMMITS=$(echo "$RELEASE_COMMITS" | awk '{ lines[NR] = $0 } END { for (i = NR; i > 0; i--) print lines[i] }')
fi | ||
done | ||
|
||
if [[ -n "$best_index" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also summarize below how many fuzzy matches we found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for commit in "${SRC_COMMITS_ARRAY[@]}"; do | ||
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit") | ||
subject=$(git log -1 --pretty=format:"%s" "$commit") | ||
meta_key="${subject}__${author}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we take into account the author date as well? Since that shouldn't be changed by a cherry-pick or rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's a great idea!
echo "" | ||
echo "🔍 Diff of release commit:" | ||
echo "---------------------------------------------" | ||
git show "$rc_commit" | sed 's/^/ /' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of showing the change of the diff, could we show the difference of the release commit and the closest fuzzy match commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
d66524a
to
dc19321
Compare
dc19321
to
29738cf
Compare
This script compares a release branch against a source branch (e.g. master) to verify that all cherry-picked commits are unmodified. It first attempts fast matching using normalized patch hashes. If no exact match is found, it falls back to a fuzzy matching mechanism: - Filters source commits by matching author and commit subject - Compares normalized diffs using diff -u - Selects the closest match based on line difference count Useful for verifying cherry-picks or rebased commits during release processes. Supports scan and compare limits for performance.
29738cf
to
31dafd3
Compare
@gemini-code-assist review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a bash script for verifying cherry-picked commits between branches, covering both exact and fuzzy matching. The review focuses on improving robustness, performance, and correctness. Key suggestions include using a more robust method for handling commit metadata, optimizing git
command usage, enhancing error handling for git fetch
, and aligning the fuzzy matching logic more closely with the stated intent.
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit" 2>/dev/null) || continue | ||
subject=$(git log -1 --pretty=format:"%s" "$commit" 2>/dev/null) || continue | ||
authordate=$(git log -1 --pretty=format:"%ai" "$commit" 2>/dev/null) || continue | ||
meta_key="${subject}__${author}__${authordate}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spawning three separate git log
processes for each commit is inefficient. This can be combined into a single git log
call. Also, using __
as a delimiter is not robust, as commit subjects or author names may contain this sequence, causing parsing issues later on. Consider using a single git log
call with a more robust delimiter like the unit separator (%x1f
).
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit" 2>/dev/null) || continue | |
subject=$(git log -1 --pretty=format:"%s" "$commit" 2>/dev/null) || continue | |
authordate=$(git log -1 --pretty=format:"%ai" "$commit" 2>/dev/null) || continue | |
meta_key="${subject}__${author}__${authordate}" | |
info=$(git log -1 --pretty=format:"%s%x1f%an <%ae>%x1f%ai" "$commit" 2>/dev/null) || continue | |
IFS=$'\x1f' read -r subject author authordate <<< "$info" | |
meta_key="$info" # Use the raw, delimited output as the meta key |
src_subject="${meta_info%%__*}" | ||
rest="${meta_info#*__}" | ||
src_author="${rest%%__*}" | ||
src_authordate="${rest##*__}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata parsing logic splits on __
, which is fragile and will fail if the delimiter appears in the commit subject or author name. If you adopt the suggestion of using a robust delimiter like \x1f
(unit separator), update the parsing logic here to match.
src_subject="${meta_info%%__*}" | |
rest="${meta_info#*__}" | |
src_author="${rest%%__*}" | |
src_authordate="${rest##*__}" | |
IFS=$'\x1f' read -r src_subject src_author src_authordate <<< "$meta_info" # Parse metadata using the robust delimiter |
echo "" | ||
|
||
echo "🔄 Fetching latest refs..." | ||
git fetch --all --quiet || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using || true
after git fetch
will suppress any errors. If the fetch fails (e.g., due to network issues), the script will continue with potentially stale data, leading to incorrect results. It's safer to remove || true
and let the script exit if the fetch fails, ensuring it always operates on complete data.
git fetch --all --quiet || true | |
git fetch --all --quiet |
fuzzy_candidates=0 | ||
|
||
for j in "${!SRC_COMMIT_META[@]}"; do | ||
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fuzzy matching logic requires the author, subject, and author date to be identical to find a candidate. This is stricter than intended. Matching on authordate
can be too strict, especially for rebased commits where the author date might be updated. Consider making the match less strict by not comparing the date.
match_author=$(git log -1 --pretty=format:"%an <%ae>" "$match_commit") | ||
match_subject=$(git log -1 --pretty=format:"%s" "$match_commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These git log
calls are redundant. The author and subject for the match_commit
have already been fetched and are available in the ${SRC_COMMIT_META[$best_index]}
variable. Parse this information directly from the metadata array instead.
match_author=$(git log -1 --pretty=format:"%an <%ae>" "$match_commit") | |
match_subject=$(git log -1 --pretty=format:"%s" "$match_commit") | |
meta_info="${SRC_COMMIT_META[$best_index]}" | |
IFS=$'\x1f' read -r match_subject match_author _ <<< "$meta_info" # Parse metadata; discard date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces a script to verify cherry-picked commits between release and source branches using both fast and fuzzy matching. The script appears well-structured and addresses a useful need. However, there are several areas where the script can be improved, including the hashing algorithm, fuzzy matching logic, character encoding, and error handling.
if command -v md5sum >/dev/null 2>&1; then | ||
md5sum | awk '{print $1}' | ||
else | ||
md5 | awk '{print $NF}' | ||
fi | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script uses md5sum
or md5
for hashing. MD5 is considered cryptographically broken and should not be used for security-sensitive applications. While this script isn't directly security-sensitive, consider using SHA256 (sha256sum
) for better collision resistance, if available, or at least warn the user that MD5 is being used and may not be reliable for large numbers of commits. If sha256sum
is not available, the script should exit with an error message.
if command -v md5sum >/dev/null 2>&1; then | |
md5sum | awk '{print $1}' | |
else | |
md5 | awk '{print $NF}' | |
fi | |
} | |
if command -v sha256sum >/dev/null 2>&1; then | |
sha256sum | awk '{print $1}' | |
elif command -v md5sum >/dev/null 2>&1; then | |
echo "WARNING: Using md5sum for hashing. Consider using sha256sum if available." >&2 | |
md5sum | awk '{print $1}' | |
elif command -v md5 >/dev/null 2>&1; then | |
echo "WARNING: Using md5 for hashing. Consider using sha256sum if available." >&2 | |
md5 | awk '{print $NF}' | |
else | |
echo "ERROR: No hashing program found (sha256sum, md5sum, or md5)." >&2 | |
exit 1 | |
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific use case md5 is fine.
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then | ||
((fuzzy_candidates++)) | ||
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true) | ||
score=$(echo "$diff" | grep -vE '^(--- |\+\+\+ )' | grep -c '^[-+]') | ||
if [[ "$score" -lt "$best_score" ]]; then | ||
best_score=$score | ||
best_index=$j | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fuzzy matching logic compares commit metadata directly. This can lead to false positives if commit messages are duplicated across different commits. Consider incorporating the commit hash itself into the comparison to reduce the likelihood of false positives. This would involve adding the commit hash to the meta_key
and using it in the comparison.
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then | |
((fuzzy_candidates++)) | |
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true) | |
score=$(echo "$diff" | grep -vE '^(--- |\+\+\+ )' | grep -c '^[-+]') | |
if [[ "$score" -lt "$best_score" ]]; then | |
best_score=$score | |
best_index=$j | |
fi | |
meta_key="${subject}__${author}__${authordate}__${commit}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit hash is not expected to be the same.
fuzzy_candidates=0 | ||
|
||
for j in "${!SRC_COMMIT_META[@]}"; do | ||
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script compares the entire SRC_COMMIT_META[$j]
with $meta_key
. This is prone to errors if there are slight variations in the author or date format. It's better to compare the individual components (subject, author, date) separately to allow for minor discrepancies.
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then | |
rc_subject_base=$(echo "$meta_key" | cut -d'_' -f1) | |
src_subject_base=$(echo "${SRC_COMMIT_META[$j]}" | cut -d'_' -f1) | |
if [[ "$rc_subject_base" == "$src_subject_base" ]]; then |
for j in "${!SRC_COMMIT_META[@]}"; do | ||
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then | ||
((fuzzy_candidates++)) | ||
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff
command is used without specifying a character encoding. This can lead to incorrect results if the source and release branches use different encodings. Specify the encoding explicitly using the -a
option to treat all files as text and force a line-by-line comparison.
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true) | |
diff=$(diff -au <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encodings are expected to be the same.
echo "" | ||
|
||
echo "🔧 Check it manually (patch diff):" | ||
echo " git diff $match_commit $rc_commit -- \$(git show --pretty=\"\" --name-only $rc_commit)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The git diff
command includes a || true
at the end. This prevents the script from exiting if the diff command fails (e.g., if the files have been deleted or renamed). However, it also masks potential errors that could indicate a problem with the comparison. Consider removing || true
and handling the error explicitly, or logging the error message for debugging purposes.
echo " git diff $match_commit $rc_commit -- \$(git show --pretty=\"\" --name-only $rc_commit)" | |
git diff "$match_commit" "$rc_commit" -- $changed_files | sed 's/^/ /' |
I tried running the script and it aborted after the first non-exact match. According to ChatGPT it's the combination of |
Thanks for the feedback! I think on mac (where i developed and tested), bash's behavior is more forgiving and will continue just fine. Will test in on linux too to make sure it's consistent. |
This script compares a release branch against a source branch (e.g. master) to verify that all cherry-picked commits are unmodified. It first attempts fast matching using normalized patch hashes.
If no exact match is found, it falls back to a fuzzy matching mechanism:
Useful for verifying cherry-picks or rebased commits during release processes. Supports scan and compare limits for performance.
To test I ran it on the rc branch (#9986):