Skip to content

Relative names #1619

@aljazerzen

Description

@aljazerzen

Abstract:
I propose to change references to columns from column to .column.

Reasoning:
I'll try to explain how resolver works and how I think about semantics of name and variables in PRQL.

During resolving, there is a major distinction between scoped and ephemeral variables:

  • Scoped variables have a definition and live until their scope exists. For example, std.sum and std.select are global so they exist indefinitely, and function parameters exist only within function body.
  • Ephemeral variables are just references into some other argument of a current function call. For example, when you call select, all columns of the relation exist as variables during resolution of the first argument.

It is beneficial to distinguish these two mechanism, because of their subtle differences. For example take this query:

func my_transform rel -> (
    rel
    select [alb.title, artist_id]
)

from alb = albums
my_transfrom

Here, relation is constructed with from and within the relation a name alb is assigned all column from table albums. Note that alb is not a "real" value, it's just a namespace for the columns. When this relation is passed to my_transform, it is stored in the rel parameter. rel is now a scoped variable while alb.title is a reference to one of its columns.

I'm not sure if I've explained that well, please tell me if I haven't.

If I compare this behavior with, say, Python and a dataframe library, scoped variables are all normal idents, while ephemeral variables would be represented with strings. This is a bit more verbose and cannot provide good errors, typing or autocomplete. (This is feature of PRQL that dataframe libraries cannot copy. Only a custom language for relations can construct custom rules for name resolution.)

So because there is distinction in resolving, I suggest we add a distinction in syntax:

func my_transform rel -> (
    rel
    select [.alb.title, .artist_id]
)

from alb = albums
my_transfrom
sort .title

Pros:

  • distinction in syntax hints to the distinction in resolving
  • for newcomers, the rule is simple: columns start with a dot

Cons:

  • additional syntax we could be without

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions