Skip to content

Copybara fails on historical non-ASCII filenames #336

@sainathseelam

Description

@sainathseelam

When running a migration with Copybara on a Git repository, Copybara fails with a java.nio.file.InvalidPathException due to a historical file whose name contained non-ASCII characters (e.g., ä).

Even if the file has been renamed in all current branches and does not exist in the latest commit, Copybara still scans the Git history and encounters the old filename. Java NIO cannot handle this legacy filename on Unix systems, causing the migration to fail.

java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /docs/images/Hauptprozess_Gesamtsteuerger?te_Test_Team_bright_transparent.drawio.png
        at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:129)
        at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:76)
        at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:312)
        at java.base/java.nio.file.Path.of(Path.java:148)
        at java.base/java.nio.file.Paths.get(Paths.java:69)
        at com.google.copybara.WorkflowRunHelper$ChangeMigrator.shouldSkipChange(WorkflowRunHelper.java:349)
        at com.google.copybara.WorkflowRunHelper$ChangeMigrator.skipChange(WorkflowRunHelper.java:327)
        at com.google.copybara.WorkflowMode.lambda$filterChanges$1(WorkflowMode.java:509)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178)
        at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
        at com.google.copybara.WorkflowMode.filterChanges(WorkflowMode.java:510)
        at com.google.copybara.WorkflowMode$1.run(WorkflowMode.java:110)
        at com.google.copybara.Workflow.run(Workflow.java:319)
        at com.google.copybara.MigrateCmd.run(MigrateCmd.java:91)
        at com.google.copybara.MigrateCmd.run(MigrateCmd.java:68)
        at com.google.copybara.Main.runInternal(Main.java:240)
        at com.google.copybara.Main.run(Main.java:140)
        at com.google.copybara.Main.main(Main.java:118)
ERROR: Unexpected error (please file a bug against copybara): Malformed input or input contains unmappable characters: /docs/images/Hauptprozess_Gesamtsteuerger?te_Test_Team_bright_transparent.drawio.png (java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /docs/images/Hauptprozess_Gesamtsteuerger?te_Test_Team_bright_transparent.drawio.png)

Key points:

The issue occurs in historical commits only; the current branch uses a corrected ASCII-only filename.

Git configurations like core.precomposeunicode and core.quotepath do not help.

Using --last-rev in Copybara does not bypass the problem, as Copybara still inspects previous commits.

The problem is specific to Java on Unix-based systems where certain UTF-8 sequences cannot be mapped to java.nio.file.Path.

Impact:
Copybara cannot migrate repositories that previously contained non-ASCII filenames, even if those files no longer exist, preventing migrations of otherwise valid commits.

Desired behavior:
Copybara should either:

Skip historical commits with filenames that cannot be represented on the filesystem, or

Provide a configuration option to ignore legacy non-ASCII filenames in Git history.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions