-
Notifications
You must be signed in to change notification settings - Fork 244
Description
Abstract:
I propose to change references to columns from column to .column.
Reasoning:
I'll try to explain how resolver works and how I think about semantics of name and variables in PRQL.
During resolving, there is a major distinction between scoped and ephemeral variables:
- Scoped variables have a definition and live until their scope exists. For example,
std.sumandstd.selectare global so they exist indefinitely, and function parameters exist only within function body. - Ephemeral variables are just references into some other argument of a current function call. For example, when you call
select, all columns of the relation exist as variables during resolution of the first argument.
It is beneficial to distinguish these two mechanism, because of their subtle differences. For example take this query:
func my_transform rel -> (
rel
select [alb.title, artist_id]
)
from alb = albums
my_transfrom
Here, relation is constructed with from and within the relation a name alb is assigned all column from table albums. Note that alb is not a "real" value, it's just a namespace for the columns. When this relation is passed to my_transform, it is stored in the rel parameter. rel is now a scoped variable while alb.title is a reference to one of its columns.
I'm not sure if I've explained that well, please tell me if I haven't.
If I compare this behavior with, say, Python and a dataframe library, scoped variables are all normal idents, while ephemeral variables would be represented with strings. This is a bit more verbose and cannot provide good errors, typing or autocomplete. (This is feature of PRQL that dataframe libraries cannot copy. Only a custom language for relations can construct custom rules for name resolution.)
So because there is distinction in resolving, I suggest we add a distinction in syntax:
func my_transform rel -> (
rel
select [.alb.title, .artist_id]
)
from alb = albums
my_transfrom
sort .title
Pros:
- distinction in syntax hints to the distinction in resolving
- for newcomers, the rule is simple: columns start with a dot
Cons:
- additional syntax we could be without