-
Notifications
You must be signed in to change notification settings - Fork 297
Description
When running a migration with Copybara on a Git repository, Copybara fails with a java.nio.file.InvalidPathException due to a historical file whose name contained non-ASCII characters (e.g., ä).
Even if the file has been renamed in all current branches and does not exist in the latest commit, Copybara still scans the Git history and encounters the old filename. Java NIO cannot handle this legacy filename on Unix systems, causing the migration to fail.
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /docs/images/Hauptprozess_Gesamtsteuerger?te_Test_Team_bright_transparent.drawio.png
at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:129)
at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:76)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:312)
at java.base/java.nio.file.Path.of(Path.java:148)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at com.google.copybara.WorkflowRunHelper$ChangeMigrator.shouldSkipChange(WorkflowRunHelper.java:349)
at com.google.copybara.WorkflowRunHelper$ChangeMigrator.skipChange(WorkflowRunHelper.java:327)
at com.google.copybara.WorkflowMode.lambda$filterChanges$1(WorkflowMode.java:509)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178)
at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
at com.google.copybara.WorkflowMode.filterChanges(WorkflowMode.java:510)
at com.google.copybara.WorkflowMode$1.run(WorkflowMode.java:110)
at com.google.copybara.Workflow.run(Workflow.java:319)
at com.google.copybara.MigrateCmd.run(MigrateCmd.java:91)
at com.google.copybara.MigrateCmd.run(MigrateCmd.java:68)
at com.google.copybara.Main.runInternal(Main.java:240)
at com.google.copybara.Main.run(Main.java:140)
at com.google.copybara.Main.main(Main.java:118)
ERROR: Unexpected error (please file a bug against copybara): Malformed input or input contains unmappable characters: /docs/images/Hauptprozess_Gesamtsteuerger?te_Test_Team_bright_transparent.drawio.png (java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /docs/images/Hauptprozess_Gesamtsteuerger?te_Test_Team_bright_transparent.drawio.png)
Key points:
The issue occurs in historical commits only; the current branch uses a corrected ASCII-only filename.
Git configurations like core.precomposeunicode and core.quotepath do not help.
Using --last-rev in Copybara does not bypass the problem, as Copybara still inspects previous commits.
The problem is specific to Java on Unix-based systems where certain UTF-8 sequences cannot be mapped to java.nio.file.Path.
Impact:
Copybara cannot migrate repositories that previously contained non-ASCII filenames, even if those files no longer exist, preventing migrations of otherwise valid commits.
Desired behavior:
Copybara should either:
Skip historical commits with filenames that cannot be represented on the filesystem, or
Provide a configuration option to ignore legacy non-ASCII filenames in Git history.