Skip to content

scripts: add cherry-pick verification tool with fuzzy matching #10034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bhandras
Copy link
Collaborator

@bhandras bhandras commented Jul 4, 2025

This script compares a release branch against a source branch (e.g. master) to verify that all cherry-picked commits are unmodified. It first attempts fast matching using normalized patch hashes.

If no exact match is found, it falls back to a fuzzy matching mechanism:

  • Filters source commits by matching author and commit subject
  • Compares normalized diffs using diff -u
  • Selects the closest match based on line difference count

Useful for verifying cherry-picks or rebased commits during release processes. Supports scan and compare limits for performance.

To test I ran it on the rc branch (#9986):

./scripts/fuzzy-match-release-branch.sh --source master --release 0-19-2-branch-rc1 --limit 92

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @bhandras, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new command-line tool to automate the verification of cherry-picked commits during release processes. Its primary purpose is to ensure that commits brought from a source branch into a release branch are identical or very similar, helping to maintain release quality and identify unintended modifications or discrepancies efficiently.

Highlights

  • New Cherry-Pick Verification Tool: Introduces a new bash script, scripts/fuzzy-match-release-branch.sh, designed to verify the integrity and presence of cherry-picked commits between a source branch (e.g., master) and a release branch.
  • Dual-Phase Matching Logic: The script employs a two-step verification process: it first attempts an exact match using normalized patch hashes. If no exact match is found, it falls back to a fuzzy matching algorithm that filters source commits by author and subject, then compares normalized diffs to find the closest match based on line difference count.
  • Configurable Performance and Usability: The tool supports command-line arguments to specify the source and release branches, as well as limits for the number of commits to scan in the source branch and compare in the release branch, enhancing performance for large repositories. It provides clear, detailed output for matched and unmatched commits, including suggestions for manual git diff for fuzzy matches.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a very useful script for verifying cherry-picks during a release process. The implementation is solid, using both exact and fuzzy matching. I've provided feedback on a few critical correctness and high-priority performance issues. Specifically, the argument parsing and fuzzy-match scoring have bugs that need to be addressed. Additionally, there are several opportunities to optimize performance by reducing the number of git calls within loops. Addressing these points will make the script more robust and efficient.

Comment on lines 21 to 43
while [[ $# -gt 0 ]]; do
case "$1" in
--source) SRC_BRANCH="$2"; shift 2 ;;
--release) RELEASE_BRANCH="$2"; shift 2 ;;
--scan-limit) SRC_SCAN_LIMIT="$2"; shift 2 ;;
--limit) RELEASE_LIMIT="$2"; shift 2 ;;
-h|--help) show_help ;;
*) echo "Unknown argument: $1"; show_help ;;
esac
done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current argument parsing logic is vulnerable to errors when an option is passed without a corresponding value. For example, if the script is run as ./script.sh --source --limit 10, the string "--limit" will be incorrectly assigned to SRC_BRANCH.

To make the parsing more robust, you should validate that a value is provided for each option and that the value is not another option flag.

Suggested change
while [[ $# -gt 0 ]]; do
case "$1" in
--source) SRC_BRANCH="$2"; shift 2 ;;
--release) RELEASE_BRANCH="$2"; shift 2 ;;
--scan-limit) SRC_SCAN_LIMIT="$2"; shift 2 ;;
--limit) RELEASE_LIMIT="$2"; shift 2 ;;
-h|--help) show_help ;;
*) echo "Unknown argument: $1"; show_help ;;
esac
done
while [[ $# -gt 0 ]]; do
case "$1" in
--source|--release|--scan-limit|--limit)
if [[ -z "$2" || "$2" =~ ^- ]]; then
echo "Error: Missing value for argument $1" >&2
show_help
fi
case "$1" in
--source) SRC_BRANCH="$2" ;;
--release) RELEASE_BRANCH="$2" ;;
--scan-limit) SRC_SCAN_LIMIT="$2" ;;
--limit) RELEASE_LIMIT="$2" ;;
esac
shift 2
;;
-h|--help) show_help ;;
*) echo "Unknown argument: $1"; show_help ;;
esac
done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, replaced!

for j in "${!SRC_COMMIT_META[@]}"; do
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true)
score=$(echo "$diff" | grep '^[-+]' | wc -l | xargs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current method for calculating the diff score incorrectly includes diff header lines (e.g., --- a/file.txt and +++ b/file.txt) in its count because they also start with + or -. This inflates the score and can lead to an inaccurate fuzzy match.

The calculation should be adjusted to count only the lines representing actual content changes.

Suggested change
score=$(echo "$diff" | grep '^[-+]' | wc -l | xargs)
score=$(echo "$diff" | grep -vE '^(--- |\+\+\+ )' | grep -c '^[-+]')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 78 to 79
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit")
subject=$(git log -1 --pretty=format:"%s" "$commit")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script makes two separate git log calls for each source commit to get the author and subject. This is inefficient and can significantly slow down the script when scanning a large number of commits.

These calls can be combined into a single git log command using a custom format string with a null character separator for safe parsing. This optimization should also be applied to lines 95-96 where release commit info is fetched.

Suggested change
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit")
subject=$(git log -1 --pretty=format:"%s" "$commit")
IFS=$'\0' read -r author subject < <(git log -1 --pretty=format:"%an <%ae>%x00%s" "$commit")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke the script, kept the original solution.

Comment on lines 95 to 96
rc_author=$(git log -1 --pretty=format:"%an <%ae>" "$rc_commit")
rc_subject=$(git log -1 --pretty=format:"%s" "$rc_commit")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the source commit indexing loop, two separate git log calls are made here for each release commit. This is inefficient.

Combining these into a single git log call will improve performance.

Suggested change
rc_author=$(git log -1 --pretty=format:"%an <%ae>" "$rc_commit")
rc_subject=$(git log -1 --pretty=format:"%s" "$rc_commit")
IFS=$'\0' read -r rc_author rc_subject < <(git log -1 --pretty=format:"%an <%ae>%x00%s" "$rc_commit")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke the script, kept the original solution. Efficiency is not the key here.

Comment on lines 104 to 114
found_exact=""
for j in "${!SRC_PATCH_HASHES[@]}"; do
if [[ "${SRC_PATCH_HASHES[$j]}" == "$rc_patch_hash" ]]; then
found_exact="${SRC_COMMITS_ARRAY[$j]}"
break
fi
done

if [[ -n "$found_exact" ]]; then
src_subject=$(git log -1 --pretty=format:"%s" "$found_exact")
src_author=$(git log -1 --pretty=format:"%an <%ae>" "$found_exact")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When an exact patch match is found, the script makes new git log calls to get the source commit's subject and author. This is redundant and inefficient because this information was already collected and stored in SRC_COMMIT_META during the indexing phase.

You can improve performance by refactoring this section to retrieve the metadata from the pre-populated arrays instead of calling git again.

Suggested change
found_exact=""
for j in "${!SRC_PATCH_HASHES[@]}"; do
if [[ "${SRC_PATCH_HASHES[$j]}" == "$rc_patch_hash" ]]; then
found_exact="${SRC_COMMITS_ARRAY[$j]}"
break
fi
done
if [[ -n "$found_exact" ]]; then
src_subject=$(git log -1 --pretty=format:"%s" "$found_exact")
src_author=$(git log -1 --pretty=format:"%an <%ae>" "$found_exact")
found_exact_index=-1
for j in "${!SRC_PATCH_HASHES[@]}"; do
if [[ "${SRC_PATCH_HASHES[$j]}" == "$rc_patch_hash" ]]; then
found_exact_index=$j
break
fi
done
if [[ $found_exact_index -ne -1 ]]; then
found_exact="${SRC_COMMITS_ARRAY[$found_exact_index]}"
meta_info="${SRC_COMMIT_META[$found_exact_index]}"
src_subject="${meta_info%__*}"
src_author="${meta_info#*__}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool idea for a script!
Have a couple of ideas on how to improve it a bit.

if [[ "$RELEASE_LIMIT" -gt 0 ]]; then
RELEASE_COMMITS=$(echo "$RELEASE_COMMITS" | head -n "$RELEASE_LIMIT")
fi
RELEASE_COMMITS=$(echo "$RELEASE_COMMITS" | tail -r)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -r option seems to be BSD specific, it doesn't exist on Linux. Also, doesn't tail by default take the last 10 lines only? Which would mean we only look at the last 10 commits?
Use tac instead, which reverses the order line by line as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with the portable RELEASE_COMMITS=$(echo "$RELEASE_COMMITS" | awk '{ lines[NR] = $0 } END { for (i = NR; i > 0; i--) print lines[i] }')

fi
done

if [[ -n "$best_index" ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also summarize below how many fuzzy matches we found?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for commit in "${SRC_COMMITS_ARRAY[@]}"; do
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit")
subject=$(git log -1 --pretty=format:"%s" "$commit")
meta_key="${subject}__${author}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take into account the author date as well? Since that shouldn't be changed by a cherry-pick or rebase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's a great idea!

echo ""
echo "🔍 Diff of release commit:"
echo "---------------------------------------------"
git show "$rc_commit" | sed 's/^/ /'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of showing the change of the diff, could we show the difference of the release commit and the closest fuzzy match commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bhandras bhandras force-pushed the fuzzy-matching-script branch from d66524a to dc19321 Compare July 4, 2025 13:47
@bhandras bhandras requested a review from guggero July 4, 2025 13:47
@bhandras bhandras force-pushed the fuzzy-matching-script branch from dc19321 to 29738cf Compare July 4, 2025 14:09
This script compares a release branch against a source branch
(e.g. master) to verify that all cherry-picked commits are
unmodified. It first attempts fast matching using normalized
patch hashes.

If no exact match is found, it falls back to a fuzzy matching
mechanism:

- Filters source commits by matching author and commit subject
- Compares normalized diffs using diff -u
- Selects the closest match based on line difference count

Useful for verifying cherry-picks or rebased commits during
release processes. Supports scan and compare limits for
performance.
@bhandras bhandras force-pushed the fuzzy-matching-script branch from 29738cf to 31dafd3 Compare July 4, 2025 14:39
@bhandras
Copy link
Collaborator Author

bhandras commented Jul 4, 2025

@gemini-code-assist review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bash script for verifying cherry-picked commits between branches, covering both exact and fuzzy matching. The review focuses on improving robustness, performance, and correctness. Key suggestions include using a more robust method for handling commit metadata, optimizing git command usage, enhancing error handling for git fetch, and aligning the fuzzy matching logic more closely with the stated intent.

Comment on lines +104 to +107
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit" 2>/dev/null) || continue
subject=$(git log -1 --pretty=format:"%s" "$commit" 2>/dev/null) || continue
authordate=$(git log -1 --pretty=format:"%ai" "$commit" 2>/dev/null) || continue
meta_key="${subject}__${author}__${authordate}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Spawning three separate git log processes for each commit is inefficient. This can be combined into a single git log call. Also, using __ as a delimiter is not robust, as commit subjects or author names may contain this sequence, causing parsing issues later on. Consider using a single git log call with a more robust delimiter like the unit separator (%x1f).

Suggested change
author=$(git log -1 --pretty=format:"%an <%ae>" "$commit" 2>/dev/null) || continue
subject=$(git log -1 --pretty=format:"%s" "$commit" 2>/dev/null) || continue
authordate=$(git log -1 --pretty=format:"%ai" "$commit" 2>/dev/null) || continue
meta_key="${subject}__${author}__${authordate}"
info=$(git log -1 --pretty=format:"%s%x1f%an <%ae>%x1f%ai" "$commit" 2>/dev/null) || continue
IFS=$'\x1f' read -r subject author authordate <<< "$info"
meta_key="$info" # Use the raw, delimited output as the meta key

Comment on lines +145 to +148
src_subject="${meta_info%%__*}"
rest="${meta_info#*__}"
src_author="${rest%%__*}"
src_authordate="${rest##*__}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The metadata parsing logic splits on __, which is fragile and will fail if the delimiter appears in the commit subject or author name. If you adopt the suggestion of using a robust delimiter like \x1f (unit separator), update the parsing logic here to match.

Suggested change
src_subject="${meta_info%%__*}"
rest="${meta_info#*__}"
src_author="${rest%%__*}"
src_authordate="${rest##*__}"
IFS=$'\x1f' read -r src_subject src_author src_authordate <<< "$meta_info" # Parse metadata using the robust delimiter

echo ""

echo "🔄 Fetching latest refs..."
git fetch --all --quiet || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using || true after git fetch will suppress any errors. If the fetch fails (e.g., due to network issues), the script will continue with potentially stale data, leading to incorrect results. It's safer to remove || true and let the script exit if the fetch fails, ensuring it always operates on complete data.

Suggested change
git fetch --all --quiet || true
git fetch --all --quiet

fuzzy_candidates=0

for j in "${!SRC_COMMIT_META[@]}"; do
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fuzzy matching logic requires the author, subject, and author date to be identical to find a candidate. This is stricter than intended. Matching on authordate can be too strict, especially for rebased commits where the author date might be updated. Consider making the match less strict by not comparing the date.

Comment on lines +192 to +193
match_author=$(git log -1 --pretty=format:"%an <%ae>" "$match_commit")
match_subject=$(git log -1 --pretty=format:"%s" "$match_commit")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These git log calls are redundant. The author and subject for the match_commit have already been fetched and are available in the ${SRC_COMMIT_META[$best_index]} variable. Parse this information directly from the metadata array instead.

Suggested change
match_author=$(git log -1 --pretty=format:"%an <%ae>" "$match_commit")
match_subject=$(git log -1 --pretty=format:"%s" "$match_commit")
meta_info="${SRC_COMMIT_META[$best_index]}"
IFS=$'\x1f' read -r match_subject match_author _ <<< "$meta_info" # Parse metadata; discard date

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a script to verify cherry-picked commits between release and source branches using both fast and fuzzy matching. The script appears well-structured and addresses a useful need. However, there are several areas where the script can be improved, including the hashing algorithm, fuzzy matching logic, character encoding, and error handling.

Comment on lines +51 to +56
if command -v md5sum >/dev/null 2>&1; then
md5sum | awk '{print $1}'
else
md5 | awk '{print $NF}'
fi
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script uses md5sum or md5 for hashing. MD5 is considered cryptographically broken and should not be used for security-sensitive applications. While this script isn't directly security-sensitive, consider using SHA256 (sha256sum) for better collision resistance, if available, or at least warn the user that MD5 is being used and may not be reliable for large numbers of commits. If sha256sum is not available, the script should exit with an error message.

Suggested change
if command -v md5sum >/dev/null 2>&1; then
md5sum | awk '{print $1}'
else
md5 | awk '{print $NF}'
fi
}
if command -v sha256sum >/dev/null 2>&1; then
sha256sum | awk '{print $1}'
elif command -v md5sum >/dev/null 2>&1; then
echo "WARNING: Using md5sum for hashing. Consider using sha256sum if available." >&2
md5sum | awk '{print $1}'
elif command -v md5 >/dev/null 2>&1; then
echo "WARNING: Using md5 for hashing. Consider using sha256sum if available." >&2
md5 | awk '{print $NF}'
else
echo "ERROR: No hashing program found (sha256sum, md5sum, or md5)." >&2
exit 1
fi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific use case md5 is fine.

Comment on lines +177 to +184
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then
((fuzzy_candidates++))
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true)
score=$(echo "$diff" | grep -vE '^(--- |\+\+\+ )' | grep -c '^[-+]')
if [[ "$score" -lt "$best_score" ]]; then
best_score=$score
best_index=$j
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fuzzy matching logic compares commit metadata directly. This can lead to false positives if commit messages are duplicated across different commits. Consider incorporating the commit hash itself into the comparison to reduce the likelihood of false positives. This would involve adding the commit hash to the meta_key and using it in the comparison.

Suggested change
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then
((fuzzy_candidates++))
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true)
score=$(echo "$diff" | grep -vE '^(--- |\+\+\+ )' | grep -c '^[-+]')
if [[ "$score" -lt "$best_score" ]]; then
best_score=$score
best_index=$j
fi
meta_key="${subject}__${author}__${authordate}__${commit}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit hash is not expected to be the same.

fuzzy_candidates=0

for j in "${!SRC_COMMIT_META[@]}"; do
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script compares the entire SRC_COMMIT_META[$j] with $meta_key. This is prone to errors if there are slight variations in the author or date format. It's better to compare the individual components (subject, author, date) separately to allow for minor discrepancies.

Suggested change
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then
rc_subject_base=$(echo "$meta_key" | cut -d'_' -f1)
src_subject_base=$(echo "${SRC_COMMIT_META[$j]}" | cut -d'_' -f1)
if [[ "$rc_subject_base" == "$src_subject_base" ]]; then

for j in "${!SRC_COMMIT_META[@]}"; do
if [[ "${SRC_COMMIT_META[$j]}" == "$meta_key" ]]; then
((fuzzy_candidates++))
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The diff command is used without specifying a character encoding. This can lead to incorrect results if the source and release branches use different encodings. Specify the encoding explicitly using the -a option to treat all files as text and force a line-by-line comparison.

Suggested change
diff=$(diff -u <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true)
diff=$(diff -au <(echo "$rc_patch") <(echo "${SRC_PATCHES[$j]}") || true)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encodings are expected to be the same.

echo ""

echo "🔧 Check it manually (patch diff):"
echo " git diff $match_commit $rc_commit -- \$(git show --pretty=\"\" --name-only $rc_commit)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The git diff command includes a || true at the end. This prevents the script from exiting if the diff command fails (e.g., if the files have been deleted or renamed). However, it also masks potential errors that could indicate a problem with the comparison. Consider removing || true and handling the error explicitly, or logging the error message for debugging purposes.

Suggested change
echo " git diff $match_commit $rc_commit -- \$(git show --pretty=\"\" --name-only $rc_commit)"
git diff "$match_commit" "$rc_commit" -- $changed_files | sed 's/^/ /'

@guggero
Copy link
Collaborator

guggero commented Jul 7, 2025

I tried running the script and it aborted after the first non-exact match. According to ChatGPT it's the combination of set -euo pipefail and the continue in the loop, causing some variables to be seen as un-declared.
Commenting out set -euo pipefail fixed it for me, but potentially just removing u might also help.

@bhandras
Copy link
Collaborator Author

bhandras commented Jul 7, 2025

I tried running the script and it aborted after the first non-exact match. According to ChatGPT it's the combination of set -euo pipefail and the continue in the loop, causing some variables to be seen as un-declared. Commenting out set -euo pipefail fixed it for me, but potentially just removing u might also help.

Thanks for the feedback! I think on mac (where i developed and tested), bash's behavior is more forgiving and will continue just fine. Will test in on linux too to make sure it's consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants