Skip to content

FR: Add plan_anchor field to PlanRel #725

@tokoko

Description

@tokoko

I'm working on a library that lets you build up substrait plans using dataframe API. Over the course of dataframe transformations various unrelated plans need to be merged to construct the final substrait plan. For the most part it's pretty straightforward, but you need to deal with extension functions and references during the merges.

Functions in substrait fortunately let you define "pointers" in a relaxed fashion with anchors. This means that as long as there is some global way of assigning a unique identifier to each function beforehand, merging plans become straightforward. Unlike functions, relationship between PlanRels and RefereceRels is modeled differently, RefereceRels need to be aware of the ordinal position of the target rel in the relations array. This makes merging plans very tricky. One has to either traverse the whole plan to adjust ordinal positions on each merge or use some sort of placeholders and do a single-pass traversal at the end.

To make this simpler, substrait could use the same anchor mechanism with PlanRels. An additional plan_anchor field can be introduced in PlanRel message, which will be used by ReferenceRel instead of relying on ordinal positions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions