Skip to content

TarReader throws on various archives that other tools accept #74316

Closed
@danmoseley

Description

@danmoseley

I tried opening:

  1. each of the tar files used to test Golang's tar package (here with details about each in the tests here).
  2. each of the tar files used to test node-tar, found here.
  3. each of the tar files used to test libarchive, found here. Note I had to uudecode these.

Note all the above have permissive licenses so it may be possible to borrow these tars for our test assets.

I used the test code below to open each, ignored those that opened successfully, and for those that failed compared whether some other tools could open them. The interesting cases are where other tools (particularly GNU tar) can open them, but we cannot. Note: I mostly didn't extract the entries, just checked they could be listed. In some cases, the tar can be listed, but extraction will fail.

test code I used
// See https://aka.ms/new-console-template for more information
using System.Formats.Tar;
using Xunit;

public static class C
{

    public async static Task Main()
    {
        List<Task> tasks = new();
        foreach (string path in Directory.EnumerateFiles(@"C:\git\go\src\archive\tar\testdata", "*.tar"))
        {
            tasks.Add(Task.Run(async () =>
            {
                TarEntry? entry = null;

                try
                {
                    //Console.WriteLine($"{path} opening...");
                    using FileStream fs = new(path, FileMode.Open);
                    using TarReader reader = new(fs, leaveOpen: false);

                    while ((entry = await reader.GetNextEntryAsync()) != null)
                    {
                        var ms = new MemoryStream();

                        Assert.NotEmpty(entry.Name);
                        Assert.True(Enum.IsDefined(entry.EntryType));
                        Assert.True(Enum.IsDefined(entry.Format));

                        if (entry.EntryType == TarEntryType.Directory)
                            continue;

                        var ds = entry.DataStream;
                        if (ds != null && ds.Length > 0)
                        {
                            ds.CopyTo(ms);
                        }
                    }
                }
                catch (Exception ex) //when (!(ex is FormatException))
                {
                    Console.WriteLine($"{path} opening {entry?.Name} threw {ex.Message}");
                }
            }));
        }

        await Task.WhenAll(tasks);
    }
}
 
source Column1 issue gnu tar 7z golang .NET .NET Exception
golang gnu-multi-hdrs.tar duplicate headers reads one reads one w/warning reads one ERROR A metadata entry of type 'LongPath' was unexpectedly found after a metadata entry of type 'LongPath'.
golang gnu-incremental.tar incremental format reads ok reads ok ERROR Unable to read beyond the end of the stream.
golang invalid-go17.tar ?? reads ok reads ok reads ok ERROR Could not find any recognizable digits.
golang hdr-only.tar just header reads with errors reads ok reads ok ERROR Additional non-parsable characters are at the end of the string.
golang nil-uid.tar zero uid reads ok reads w/warnings reads ok ERROR Unable to read beyond the end of the stream.
golang pax-multi-hdrs.tar 2 headers reads ok reads w/warnings reads ok ERROR A metadata entry of type 'ExtendedAttributes' was unexpectedly found after a metadata entry of type 'ExtendedAttributes'.
golang pax-bad-mtime-file.tar bad modified time reads ok reads w/warnings ERROR Unable to read beyond the end of the stream.
golang pax-pos-size-file.tar ? reads ok reads w/warnings reads ok ERROR Unable to read beyond the end of the stream.
golang v7.tar v7 reads ok reads ok reads ok ERROR Could not find any recognizable digits.
golang sparse-formats.tar something about sparseness reads ok reads ok ERROR Additional non-parsable characters are at the end of the string.
golang ustar-file-reg.tar non-zero device numbers. reads ok reads ok ERROR Unable to read beyond the end of the stream.
golang writer-big.tar truncated huge ERROR reads ok ERROR Could not find any recognizable digits.
golang pax-path-hdr.tar ? reads empty ERROR reads header ERROR Unable to read beyond the end of the stream.
golang writer-big-long.tar truncated huge ERROR reads w/ unexpected end of data reads ok ERROR Unable to read beyond the end of the stream.
mine huge.tar dd if=/dev/zero bs=1G count=16 > huge.tar reads ERROR Value was either too large or too small for a UInt32
golang issue10968.tar garbled header ERROR ERROR (but OK) Could not find any recognizable digits.
golang issue11169.tar ?? ERROR ERROR (but OK) Additional non-parsable characters are at the end of the string.
golang neg-size.tar negative size ERROR refuses ERROR ERROR (but OK) Could not find any recognizable digits.
golang pax-bad-hdr-file.tar bad header reads with errors reads ok ERROR ERROR (but OK) Unable to read beyond the end of the stream.
node long-pax.tar 120 byte filename (pax limit 100) reads headers reads w/ unexpected end of data ERROR 120-byte-filename-cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc threw Unable to read beyond the end of the stream.
node next-file-has-long.tar link to 170 byte name in GNU ERROR Entry 'NextFileHasLongPath' was expected to be in the GNU format, but did not have the expected version data.
node path-missing.tar empty name "Substituting `.' for empty member name" (but not clear this is useful..) silently uses tar file name ERROR on extraction Cannot create 'c:\tar' because a file or directory with the same name already exists (NOTE -- we should probably fix to fail earlier, in GetDestinationAndLinkPaths())
node links-strip.tar ?symlink and hardlinks reads ok reads w/ unexpected end of data ERROR Unable to read beyond the end of the stream.
mine empty.tar 0 bytes reads OK reads ok OK
libarchive test_compat_gtar_2.tar huge gid reads OK reads ok ERROR Could not find any recognizable digits.
libarchive test_compat_perl_archive_tar.tar ? reads OK reads ok ERROR Could not find any recognizable digits.
libarchive test_compat_gtar_1.tar 200 byte filenames and symlink? reads OK reads ok ERROR Could not find any recognizable digits.
libarchive test_compat_plexus_archiver_tar.tar reads OK w/tar: A lone zero block at 3 reads w/ There are some data after the end of the payload data ERROR Could not find any recognizable digits.
libarchive test_compat_solaris_tar_acl.tar reads OK w/Unknown file type ‘A’ reads ok OK (no exception, but unexpected TarEntryType 65 = 'A' .. A custom extension)
libarchive test_compat_tar_hardlink_1.tar reads OK reads w/ unexpected end of data ERROR Could not find any recognizable digits.
libarchive test_read_format_gtar_sparse_1_17_posix00.tar reads OK reads ok ERROR The entry './PaxHeaders.38659/sparse' has a duplicate extended attribute.
libarchive test_read_format_tar_invalid_pax_size.tar ERRORS ERROR ERROR Could not find any recognizable digits.

Possibly some of these are expected limitations, but for the others we should add checkboxes and work through and fix them.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions