Skip to content

[iceberg] When using hadoop catalog and query tables created by Spark the column name is incorrect #26133

@PingLiuPing

Description

@PingLiuPing

Your Environment

  • Presto version used:
  • Storage (HDFS/S3/GCS..):
  • Data source and connector used:
  • Deployment (Cloud or On-prem):
  • Pastebin link to the complete debug logs:

Use Spark to create a iceberg table as following:

spark-sql (default)>
> show create table iceberg.emp_mix;
CREATE TABLE iceberg.emp_mix (
emp_name STRING,
emp_LOCATION STRING,
EMP_ID INT,
dep_id STRING,
u_ID INT)
USING iceberg
LOCATION 'file:///work/opensource/presto/presto-native-execution/data/iceberg_data/HADOOP/iceberg/emp_mix'
TBLPROPERTIES (
'current-snapshot-id' = '6820088206506591358',
'format' = 'iceberg/parquet',
'format-version' = '2',
'write.parquet.compression-codec' = 'zstd')

Note that the column names are mixed case.

Next using Prestissimo to query the table:

presto:iceberg> show create table emp_mix;
Create Table

CREATE TABLE iceberg.iceberg.emp_mix (
"emp_name" varchar,
"emp_location" varchar,
"emp_id" integer,
"dep_id" varchar,
"u_id" integer
)
WITH (
"format-version" = '2',
location = 'file:///Users/pingliu/work/opensource/presto/presto-native-execution/data/iceberg_data/HADOOP/iceberg/emp_mix',
"read.split.target-size" = 134217728,
"write.delete.mode" = 'copy-on-write',
"write.format.default" = 'PARQUET',
"write.metadata.delete-after-commit.enabled" = false,
"write.metadata.metrics.max-inferred-column-defaults" = 100,
"write.metadata.previous-versions-max" = 100,
"write.update.mode" = 'copy-on-write'
)
(1 row)

The column names are all lower case.
The iceberg table metadata file indicates the column names are mixed case.

And also from the parquet data file metadata, the column names are mixed case too.

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Screenshots (if appropriate)

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions