Skip to content

query a parquet file 4 times slower than clickhouse local #115

Open
@l1t1

Description

@l1t1

(you don't have to strictly follow this form)

Describe the situation
SELECT avg(i) FROM file('/data/t.parquet') group by round(log10(i));
chdb costs 400s, clickhouse local costs 100s
How to reproduce

  • Which ClickHouse server version to use 23.6
  • Which interface to use, if matters CLI.py
  • Non-default settings, if any
  • CREATE TABLE statements for all tables involved
    select number::int i FROM numbers_mt(1,1000000000)t into outfile '/data/t.parquet';
  • Sample data for all these tables, use clickhouse-obfuscator if necessary
  • Queries to run that lead to slow performance
    SELECT avg(i) FROM file('/data/t.parquet') group by round(log10(i));

Expected performance
What are your performance expectation, why do you think they are realistic? Has it been working faster in older ClickHouse releases? Is it working faster in some specific other system?
I hope chdb runs as fast as clickhouse local.
Additional context
Add any other context about the problem here.
btw
select number::int i FROM numbers_mt(1,1000000000)t into outfile '/data/t.parquet';
chdb runs as fast as clickhouse local

Metadata

Metadata

Assignees

No one assigned

    Labels

    ArrowApache Arrow supportenhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions