fix: 3145 add `read_parquet(use_arrow: bool)` #3149

zilto · 2025-09-29T19:09:18Z

Attempt to fix #3145

Changes

the main change is a single line: yield batch.to_pylist() -> yield batch if use_pyarrow else batch.to_pylist()
add basic tests for all readers (previously untested)
fixed typing for arg items from the incorrect Iterator to Iterable

Future work

unify reader signatures
implement CSV reader with pyarrow instead of pandas given the former is a required dependency of dlt
set default read_parquet(use_pyarrow=True) instead of False because it should be faster and pyarrow is a required deps.
add a read_parquet_duckdb() reader OR automatically use duckdb inside read_parquet() if duckdb is available

cloudflare-workers-and-pages · 2025-09-29T19:09:27Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	docs	`22b8645`	Commit Preview URL Branch Preview URL	Sep 30 2025, 04:10 PM

rudolfix

LGTM! common tests are not passing on pendulum 2.0 - there's no Timezone object there.

zilto · 2025-09-30T14:18:36Z

@rudolfix any objection to setting use_pyarrow = True as default? User reported a 20x performance improvement from 10min to <1min loading big parquet files

zilto · 2025-09-30T20:37:45Z

@rudolfix any objection to setting use_pyarrow = True as default? User reported a 20x performance improvement from 10min to <1min loading big parquet files

After discussion, we decided to keep use_pyarrow=False as default for full backwards compatibility.

There could be edge cases that differ between Python and Pyarrow normalization and type inference. This could evolve the schema or break pipelines.

TODO

Once we evaluate some of these edge cases, we should set a deprecation warning akin to

In dlt==X.Y.Z, we're introduced use_pyarrow: bool with default False. Set to True for better performance. In dlt==X.Y.Z, default will be use_pyarrow=True. This should cause no error, but you can try use_pyarrow=True right now to validate it.

readers: added use_pyarrow kwarg + tests

7b21b94

zilto self-assigned this Sep 29, 2025

zilto added the enhancement New feature or request label Sep 29, 2025

rudolfix previously approved these changes Sep 30, 2025

View reviewed changes

zilto added the release-highlight Changes to highlight in release notes label Sep 30, 2025

zilto dismissed rudolfix’s stale review via 93280d9 September 30, 2025 13:49

removed pendulum.Timezone; doesnt exist in 2.0

22b8645

zilto force-pushed the feat/parquet-reader-optional-conversion branch from 93280d9 to 22b8645 Compare September 30, 2025 16:01

zilto merged commit 6fd20d1 into devel Sep 30, 2025
68 checks passed

zilto deleted the feat/parquet-reader-optional-conversion branch September 30, 2025 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: 3145 add `read_parquet(use_arrow: bool)` #3149

fix: 3145 add `read_parquet(use_arrow: bool)` #3149

Uh oh!

zilto commented Sep 29, 2025

Uh oh!

cloudflare-workers-and-pages bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

rudolfix left a comment •

edited

Loading

Uh oh!

zilto commented Sep 30, 2025

Uh oh!

zilto commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: 3145 add read_parquet(use_arrow: bool) #3149

fix: 3145 add read_parquet(use_arrow: bool) #3149

Uh oh!

Conversation

zilto commented Sep 29, 2025

Changes

Future work

Uh oh!

cloudflare-workers-and-pages bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

rudolfix left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zilto commented Sep 30, 2025

Uh oh!

zilto commented Sep 30, 2025

TODO

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: 3145 add `read_parquet(use_arrow: bool)` #3149

fix: 3145 add `read_parquet(use_arrow: bool)` #3149

cloudflare-workers-and-pages bot commented Sep 29, 2025 •

edited

Loading

rudolfix left a comment •

edited

Loading