Description
Is your feature request related to a problem or challenge?
Historically DataFusion was one (very) large crate datafusion,
and as it grew bigger we extracted various functionality into separate crates. This leads to both faster compile times (as the crates can be compiled in parallel) as well easier to navigate code (as the crates force a cleaner dependency separation)
As described by @waynexia the build time of DataFusion has been growing,
Some of this is due to the fact there is more code / more features to test. However a non trivial part of the long compile time is the time taken to compile the datafusion
/ core crate in https://github.com/apache/datafusion/tree/main/datafusion/core

While we are pursuing additional ways to reduce compile time, I think we should also move more code out of datafusion/core
into their own crates.
We have successfully done this in the past with other projects such as
- [EPIC] Extract remaining physical optimizer out of core #11502
- [Epic] Extract catalog functionality from the core to make it more modular #10782
Describe the solution you'd like
I would like to split out the https://github.com/apache/datafusion/tree/main/datafusion/core/src/datasource from DataFusion core
Describe alternatives you've considered
I think we will end up with several new crates
datafusion-catalog-listing
:ListingTable
and associated types likePartitionedFile
datafusion-datasource-parquet
:ParquetExec
and file firmatdatafusion-datasource-avro
AvroExec
and file formatsdatafusion-datasource-arrow
datafusion-datasource-json
datafusion-datasource-csv
I think we could start by creating datafusion-catalog-listing
and trying to pull some of the listing table implementation into there and then trying to move one of the simpler datasources out (datafusion-datasource-arrow
perhaps)
Additional context
No response