-
Notifications
You must be signed in to change notification settings - Fork 185
Redesign the group, bin, and stack transforms. #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Oh, this also |
Supporting transform composition seems necessary in theory, and useful in practice: as a user, once I have learned how to use transform to prepare my data for the mark, I want the experience to be consistent (in particular I don't want to lose my transforms when I switch from a bar mark to a stacked bar mark: the only thing I want to have to focus on is what I need to add to "stack", probably a z). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also related #145 and "Plot.(mark)y confusions"
src/marks/group.js
Outdated
} | ||
); | ||
export function groupCell(data, options) { | ||
return cell(...group(data, {out: "fill", ...options})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out:"fill"
is a bit awkward, the goal is to indicate the name of the property we want to set in the result.
I have struggled with the same issue in Plot.stack. My solution was to give a fixed name (e.g. high, low), then convert this later with {y2: high, y1: low, y: mid, value: value}
(but I couldn't find a nice way to do it from within the call).
https://github.com/observablehq/plot/pull/132/files#diff-90dbe63978af6da907d965dde0b7bcb94aa16eccae54868c5b075122a6e2ec5aR9
If I wanted to follow your approach but for multiple return values, would I have to write something like ...stack(data, {value: y, out_high: "y2", out_low: "y1", out_value: "value", out_mid: "y"…}
, or maybe ...stack(data, {value: y, out: {high: "y2", low: "y1", value: "value", mid: "y"…}
?
It would also be good to have a default "out" that would keep the names as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered having separate groupR and groupFill helpers which specify out = r and out = fill respectively, but since there are already helpers for groupCell and groupDot, it didn’t seem necessary. I expect that the out argument will be used rarely by users.
And it is syntactically shorter than having to use destructuring for the same thing, so if the expectation is that these things will always be renamed, it seems nicer to design for it. If it outputs to a z channel and requires destructuring to use, how is that better?
For stack I would expect that you would always use stackX or stackY. When you would you use something else? And so I don’t think I would expose the stack function. Even if you did want something other than stackX or stackY, you could always rename x/x1/x2 or y/y1/y2 the same way you are currently renaming high/low/etc. The latter are never usable directly as mark channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stepping back for a second, I think the desire for these transforms (stack, group, bin) is to produce data + channel definitions that can be passed to marks. Mark channels have well-known names (e.g., x1, x2), and there only a small number of them, so I’d prefer the transforms to return data structures that use these names rather than introducing new transform-specific names.
In the case of stack, the anticipated “outs” are either x/y1/y2 or y/x1/x2, so if the goal is to share code internally, I could imagine an out = y for the first case and an out = x for the second case. But again it would just be an internal method so it doesn’t really matter what we call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure we want to support stacked lines, but I might prefer stackLineX1, stackLineX2, stackLineY1, stackLineY2 to the position argument for similar reasons. (The position argument introduces new words bottom/center/top whereas x1/x2/y1/y2 are existing words, and also are less confusing in the stackX case.) I’m not sure when we’d want the midline, so I’d remove that for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decoupling the mark and the stack operation just helped me to think it thru. But you're right we can go back to stackY by default.
I use the midline in the policeDeaths plot, to create a guide in the middle (mostly for aesthetic reasons), and also to position the labels. Also, transformed into a textPath, to position labels in the US population chart.
src/transforms/group.js
Outdated
]; | ||
} | ||
|
||
export function group(data, {x = first, y = second, out, transform, ...options} = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out = "frequency"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to throw explicitly if (out === undefined)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s no mark channel named “frequency”, so I don’t think we want that as a default — it’s not usable directly. Agree we should throw an error if it’s missing though.
Unfortunately I just realized that facet-aware transforms break transform composition, specifically this test here: Line 36 in 789ea0e
If there are two transforms, say one custom transform supplied by the user, and the other something like Plot.groupX, the user-supplied one may not be facet aware. So transform composition will need to promote the user-supplied transform to become facet aware in order for the second transform to work correctly. That, or we won’t support the custom transform being facet-aware, which could also be confusing to the user. To keep things simple it might make sense to not support transform composition explicitly, and require the user to wrap the transform if desired. |
I got the simpler syntax working. To use Plot.groupX with Plot.barY, it’s now: Plot.barY(letters, Plot.groupX({x: uppers})) Now I wonder if we should get rid of Plot.groupBarY? Is it still sufficiently useful? Plot.groupBarY(letters, {x: uppers}) It’s slightly shorter, but I think there’s something nice — more revealing, self-explanatory, with fewer concepts — about the parsimonious approach. I worry a little that we are blurring the definition of “data transforms”: the group transform is no longer just a data transformation, but also channel definitions albeit decoupled from a specific mark type. It’s more like shorthand for mark options, which is useful but means it’s a larger responsibility than transforming the data. But then transforms were already heading that way when they became facet-aware… and this is clearly needed for stack. I’m ruminating on Tom MacWright’s “One way to represent things”. We’re intentionally avoiding one way to represent data with Plot because we know that data comes in so many different forms. I’m tempted to say that each mark type should have a preferred data representation, the form that works with its default accessors, e.g. passing [[x1, y1], [x2, y2], …] to Plot.dot. But I think that’s a distraction: we already have a preferred representation in channel definitions! Columns are more efficient (Plot uses it internally) and more composable (e.g., Plot.movingAverage). Our preferred representation is thus {length} as data (or an index) and then a set of columns. So in that broader sense these are still data transformations. |
Okay, added stack. 👍 Feeling good about this PR. |
Do you approve of this PR, @Fil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think the remaining issue (filtering #138) is (a bit) orthogonal.
This makes the group transforms more similar to the stack transforms, with the goal of consistency and making it easier to apply them to other mark types (e.g., applying the Plot.groupX transform to a Plot.dot mark).
I still want to investigate whether I can rewrite these as channel transforms, which would mean we wouldn’t need to pass the data to the group transforms. But I suspect that may make supporting facets tricky, so, I’m not sure if I’m going to explore that just now.
I also wonder whether the group transforms need to support transform composition (passing a transform option to the group transform, which is applied prior to grouping). The stack transforms don’t support that in main, and there aren’t any checked-in examples that need it, so I’ve forgotten whether I needed this for something specific or if it was motivated abstractly.
Let me know what you think!
Fixes #145.