-
Notifications
You must be signed in to change notification settings - Fork 144
Remove fast export munging #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FTR you could reopen the previous PR and force-push... ;-) |
@dscho: I tried to reopen the previous one, but the "reopen" button was greyed out for me. So, I created this new one instead. |
...and apparently it was grayed out because I had deleted the related branch; if I had re-created it... oh well, that's what I get for using two similarly named branches. |
TBH I do not know whether that works. I did know that PRs are automatically closed when the corresponding branch is deleted, and failed to mention that: sorry! |
b5d8f1a
to
d8be4ee
Compare
This test used an author with non-ascii characters in the name, but no special commit message. It then grep'ed for those non-ascii characters, but those are guaranteed to exist regardless of the reencoding process since the reencoding only affects the commit message, not the author or committer names. As such, the test would work even if the re-encoding process simply stripped the commit message entirely. Modify the test to actually check that the reencoding into UTF-8 worked. Signed-off-by: Elijah Newren <[email protected]>
Since git supports commit messages with an encoding other than UTF-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren <[email protected]>
When fast-export encounters a commit with an 'encoding' header, it tries to reencode in UTF-8 and then drops the encoding header. However, if it fails to reencode in UTF-8 because e.g. one of the characters in the commit message was invalid in the old encoding, then we need to retain the original encoding or otherwise we lose information needed to understand all the other (valid) characters in the original commit message. Signed-off-by: Elijah Newren <[email protected]>
The find_encoding() function returned the encoding used by a commit message, returning a default of git_commit_encoding (usually UTF-8). Although the current code does not differentiate between a commit which explicitly requested UTF-8 and one where we just assume UTF-8 because no encoding is set, it will become important when we try to preserve the encoding header. Since is_encoding_utf8() returns true when passed NULL, we can just return NULL from find_encoding() instead of returning git_commit_encoding. Signed-off-by: Elijah Newren <[email protected]>
…sted Automatic re-encoding of commit messages (and dropping of the encoding header) hurts attempts to do reversible history rewrites (e.g. sha1sum <-> sha256sum transitions, some subtree rewrites), and seems inconsistent with the general principle followed elsewhere in fast-export of requiring explicit user requests to modify the output (e.g. --signed-tags=strip, --tag-of-filtered-object=rewrite). Add a --reencode flag that the user can use to specify, and like other fast-export flags, default it to 'abort'. Signed-off-by: Elijah Newren <[email protected]>
d8be4ee
to
d18f03d
Compare
Let's see if this update also works on Windows.