Description
Is your feature request related to a problem or challenge?
There is an (implicit) assumption that metadata attached to Schema is preserved during certain operations in DataFusion.
However, this expectation is clearly not well tested or documented (e.g. see #12733)
Describe the solution you'd like
I would like the assumptions documented
Describe alternatives you've considered
I suggest documentation on in https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html that explains the high level assumptions
Then add a note /link to that section from the optimizers:
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.AnalyzerRule.html
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html
https://docs.rs/datafusion/latest/datafusion/physical_optimizer/trait.PhysicalOptimizerRule.html
My understanding of the high level assumptions are:
- schema level metadata: always passed through
- field level metadata: when there is a clear 1-1 correspondence from an input column with metadata to an output column, the metadata should be preserved
Examples
PROJECT(a, b+c)
--> field metadata ona
should be preserved, no field metadata onb+c
SUM(a) .. GROUP BY b
--> field metadata onb
is preserved, not ona
Additional context
No response