Skip to content

Document Schema metadata expectations #12736

Open
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

There is an (implicit) assumption that metadata attached to Schema is preserved during certain operations in DataFusion.

However, this expectation is clearly not well tested or documented (e.g. see #12733)

Describe the solution you'd like

I would like the assumptions documented

Describe alternatives you've considered

I suggest documentation on in https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html that explains the high level assumptions

Then add a note /link to that section from the optimizers:
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.AnalyzerRule.html
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html
https://docs.rs/datafusion/latest/datafusion/physical_optimizer/trait.PhysicalOptimizerRule.html

My understanding of the high level assumptions are:

  • schema level metadata: always passed through
  • field level metadata: when there is a clear 1-1 correspondence from an input column with metadata to an output column, the metadata should be preserved

Examples

  • PROJECT(a, b+c) --> field metadata ona should be preserved, no field metadata on b+c
  • SUM(a) .. GROUP BY b --> field metadata on b is preserved, not on a

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions