Skip to content

Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix. #1923

Closed
@Igosuki

Description

@Igosuki

Describe the bug
One can register a table with the file scheme file://, this in turns allows listing table to list files and find partitions.
Unfortunately, LocalStore returns a FileMetaStream where the SizedFile path has the prefix stripped. This could be fine except `datafusion::datasource::listing::helpers::parse_partitions_for_path``` calls strip_prefix on the file_path with the original path used to register the table, which contains the scheme.

There are two ways to fix this, either strip the scheme off the path in the registered table as well (would probably be best to let the ObjectStore implementation do that), or enhance FileMeta and use a URI instead of just a path.

To Reproduce
Steps to reproduce the behavior:

/tmp/listing_table/part1=value1/ and /tmp/listing_table/part1=value2/
should contain one parquet file each

let mut ctx = ExecutionContext::new();
        let listing_options = ListingOptions {
            file_extension: "parquet".to_string(),
            format: Arc::new(ParquetFormat::default()),
            table_partition_cols: vec!["part1"],
            collect_stat: true,
            target_partitions: 8,
        };
        ctx.register_listing_table(
            "my_table",
            "file:///tmp/listing_table",
            listing_options,
            None,
        )
        .await?;

       let df = ctx.sql("select count(*) from my_table").await?;
       let rb = df.collect().await?;
       eprintln!("rb = {:?}", rb);

Expected behavior
The above should count the lines in the files properly, with the current behavior it'll return 0.

Additional context
I'm trying to be consistent on my project and so I use schemes for both local and remote files. Finding this debug required a lot of debugging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions