Skip to content

[Epic] Split datasources out from datafusion crate (datafusion/core) #14444

Open
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Historically DataFusion was one (very) large crate datafusion, and as it grew bigger we extracted various functionality into separate crates. This leads to both faster compile times (as the crates can be compiled in parallel) as well easier to navigate code (as the crates force a cleaner dependency separation)

As described by @waynexia the build time of DataFusion has been growing,

Some of this is due to the fact there is more code / more features to test. However a non trivial part of the long compile time is the time taken to compile the datafusion / core crate in https://github.com/apache/datafusion/tree/main/datafusion/core

Image

While we are pursuing additional ways to reduce compile time, I think we should also move more code out of datafusion/core into their own crates.

We have successfully done this in the past with other projects such as

Describe the solution you'd like

I would like to split out the https://github.com/apache/datafusion/tree/main/datafusion/core/src/datasource from DataFusion core

Describe alternatives you've considered

I think we will end up with several new crates

  • datafusion-catalog-listing: ListingTable and associated types like PartitionedFile
  • datafusion-datasource-parquet: ParquetExec and file firmat
  • datafusion-datasource-avro AvroExec and file formats
  • datafusion-datasource-arrow
  • datafusion-datasource-json
  • datafusion-datasource-csv

I think we could start by creating datafusion-catalog-listing and trying to pull some of the listing table implementation into there and then trying to move one of the simpler datasources out (datafusion-datasource-arrow perhaps)

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions