Redesign the group, bin, and stack transforms. #176

mbostock · 2021-02-27T01:02:38Z

This makes the group transforms more similar to the stack transforms, with the goal of consistency and making it easier to apply them to other mark types (e.g., applying the Plot.groupX transform to a Plot.dot mark).

I still want to investigate whether I can rewrite these as channel transforms, which would mean we wouldn’t need to pass the data to the group transforms. But I suspect that may make supporting facets tricky, so, I’m not sure if I’m going to explore that just now.

I also wonder whether the group transforms need to support transform composition (passing a transform option to the group transform, which is applied prior to grouping). The stack transforms don’t support that in main, and there aren’t any checked-in examples that need it, so I’ve forgotten whether I needed this for something specific or if it was motivated abstractly.

Let me know what you think!

Fixes #145.

mbostock · 2021-02-27T01:02:54Z

Oh, this also ~~tackles part of~~ subsumes #104.

Fil · 2021-02-27T09:37:57Z

Supporting transform composition seems necessary in theory, and useful in practice: as a user, once I have learned how to use transform to prepare my data for the mark, I want the experience to be consistent (in particular I don't want to lose my transforms when I switch from a bar mark to a stacked bar mark: the only thing I want to have to focus on is what I need to add to "stack", probably a z).

Fil

Also related #145 and "Plot.(mark)y confusions"

Fil · 2021-02-27T09:50:08Z

src/marks/group.js

-    }
-  );
+export function groupCell(data, options) {
+  return cell(...group(data, {out: "fill", ...options}));


out:"fill" is a bit awkward, the goal is to indicate the name of the property we want to set in the result.

I have struggled with the same issue in Plot.stack. My solution was to give a fixed name (e.g. high, low), then convert this later with {y2: high, y1: low, y: mid, value: value} (but I couldn't find a nice way to do it from within the call).
https://github.com/observablehq/plot/pull/132/files#diff-90dbe63978af6da907d965dde0b7bcb94aa16eccae54868c5b075122a6e2ec5aR9

If I wanted to follow your approach but for multiple return values, would I have to write something like ...stack(data, {value: y, out_high: "y2", out_low: "y1", out_value: "value", out_mid: "y"…}, or maybe ...stack(data, {value: y, out: {high: "y2", low: "y1", value: "value", mid: "y"…}?

It would also be good to have a default "out" that would keep the names as-is.

I considered having separate groupR and groupFill helpers which specify out = r and out = fill respectively, but since there are already helpers for groupCell and groupDot, it didn’t seem necessary. I expect that the out argument will be used rarely by users.

And it is syntactically shorter than having to use destructuring for the same thing, so if the expectation is that these things will always be renamed, it seems nicer to design for it. If it outputs to a z channel and requires destructuring to use, how is that better?

For stack I would expect that you would always use stackX or stackY. When you would you use something else? And so I don’t think I would expose the stack function. Even if you did want something other than stackX or stackY, you could always rename x/x1/x2 or y/y1/y2 the same way you are currently renaming high/low/etc. The latter are never usable directly as mark channels.

Stepping back for a second, I think the desire for these transforms (stack, group, bin) is to produce data + channel definitions that can be passed to marks. Mark channels have well-known names (e.g., x1, x2), and there only a small number of them, so I’d prefer the transforms to return data structures that use these names rather than introducing new transform-specific names.

In the case of stack, the anticipated “outs” are either x/y1/y2 or y/x1/x2, so if the goal is to share code internally, I could imagine an out = y for the first case and an out = x for the second case. But again it would just be an internal method so it doesn’t really matter what we call it.

I’m not sure we want to support stacked lines, but I might prefer stackLineX1, stackLineX2, stackLineY1, stackLineY2 to the position argument for similar reasons. (The position argument introduces new words bottom/center/top whereas x1/x2/y1/y2 are existing words, and also are less confusing in the stackX case.) I’m not sure when we’d want the midline, so I’d remove that for now.

Decoupling the mark and the stack operation just helped me to think it thru. But you're right we can go back to stackY by default.

I use the midline in the policeDeaths plot, to create a guide in the middle (mostly for aesthetic reasons), and also to position the labels. Also, transformed into a textPath, to position labels in the US population chart.

Fil · 2021-02-27T09:53:57Z

src/transforms/group.js

+  ];
+}
+
+export function group(data, {x = first, y = second, out, transform, ...options} = {}) {


out = "frequency"

probably better to throw explicitly if (out === undefined)

There’s no mark channel named “frequency”, so I don’t think we want that as a default — it’s not usable directly. Agree we should throw an error if it’s missing though.

mbostock · 2021-02-27T18:15:10Z

Supporting transform composition seems necessary in theory, and useful in practice

Unfortunately I just realized that facet-aware transforms break transform composition, specifically this test here:

plot/src/mark.js

Line 36 in 789ea0e

} else if (this.transform.length === 2) { // facet-aware transform

If there are two transforms, say one custom transform supplied by the user, and the other something like Plot.groupX, the user-supplied one may not be facet aware. So transform composition will need to promote the user-supplied transform to become facet aware in order for the second transform to work correctly. That, or we won’t support the custom transform being facet-aware, which could also be confusing to the user.

To keep things simple it might make sense to not support transform composition explicitly, and require the user to wrap the transform if desired.

mbostock · 2021-02-27T18:40:44Z

I got the simpler syntax working. To use Plot.groupX with Plot.barY, it’s now:

Plot.barY(letters, Plot.groupX({x: uppers}))

Now I wonder if we should get rid of Plot.groupBarY? Is it still sufficiently useful?

Plot.groupBarY(letters, {x: uppers})

It’s slightly shorter, but I think there’s something nice — more revealing, self-explanatory, with fewer concepts — about the parsimonious approach.

I worry a little that we are blurring the definition of “data transforms”: the group transform is no longer just a data transformation, but also channel definitions albeit decoupled from a specific mark type. It’s more like shorthand for mark options, which is useful but means it’s a larger responsibility than transforming the data. But then transforms were already heading that way when they became facet-aware… and this is clearly needed for stack.

I’m ruminating on Tom MacWright’s “One way to represent things”. We’re intentionally avoiding one way to represent data with Plot because we know that data comes in so many different forms. I’m tempted to say that each mark type should have a preferred data representation, the form that works with its default accessors, e.g. passing [[x1, y1], [x2, y2], …] to Plot.dot. But I think that’s a distraction: we already have a preferred representation in channel definitions! Columns are more efficient (Plot uses it internally) and more composable (e.g., Plot.movingAverage). Our preferred representation is thus {length} as data (or an index) and then a set of columns. So in that broader sense these are still data transformations.

mbostock · 2021-02-28T00:38:39Z

Okay, added stack. 👍 Feeling good about this PR.

mbostock · 2021-03-01T17:02:24Z

Do you approve of this PR, @Fil?

Fil

Yes. I think the remaining issue (filtering #138) is (a bit) orthogonal.

redesign group transform

c0017da

mbostock requested a review from Fil February 27, 2021 01:02

Fil reviewed Feb 27, 2021

View reviewed changes

simpler group syntax

fc4f5ec

mbostock added 2 commits February 27, 2021 11:24

redesign bin transform

53f5b1d

order exports

976ee09

mbostock mentioned this pull request Feb 27, 2021

rename transforms #104

Closed

mbostock added this to the Friends Preview milestone Feb 27, 2021

mbostock changed the title ~~Redesign the group transforms.~~ Redesign the group and bin transforms. Feb 27, 2021

mbostock added 4 commits February 27, 2021 12:46

less maybeValue madness

3502c09

redesign stack transform

5c0d2ed

remove redundant label

8e21466

deduplicate code

743c8b7

mbostock changed the title ~~Redesign the group and bin transforms.~~ Redesign the group, bin, and stack transforms. Feb 28, 2021

mbostock added 2 commits February 27, 2021 16:53

deduplicate code

73e3a66

deduplicate code

d8a1a50

Fil mentioned this pull request Feb 28, 2021

A generic filter transform to hide marks (perhaps usable together with sort for line and area)? #138

Closed

lazyChannel

a6bf930

Fil added a commit that referenced this pull request Mar 1, 2021

stacks, rebased on #176

c108698

Fil mentioned this pull request Mar 1, 2021

stacks #177

Merged

Fil approved these changes Mar 1, 2021

View reviewed changes

mbostock merged commit a6bf930 into main Mar 1, 2021

mbostock deleted the mbostock/nugroup branch March 1, 2021 20:22

mbostock pushed a commit that referenced this pull request Mar 3, 2021

stacks, rebased on #176

9deabf1

mbostock pushed a commit that referenced this pull request Mar 3, 2021

stacks, rebased on #176

783e4ca

mbostock mentioned this pull request Mar 4, 2021

Plot.link default channels? #116

Closed

backend-devloper pushed a commit to backend-devloper/plot that referenced this pull request Nov 24, 2023

stacks, rebased on observablehq/plot#176

4385aa3

tigrevol8888 added a commit to tigrevol8888/plot that referenced this pull request Jul 5, 2024

stacks, rebased on observablehq/plot#176

caa785d

Redesign the group, bin, and stack transforms. #176

Redesign the group, bin, and stack transforms. #176

Uh oh!

Conversation

mbostock commented Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbostock commented Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fil commented Feb 27, 2021

Uh oh!

Fil left a comment

Choose a reason for hiding this comment

Uh oh!

Fil Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbostock Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

mbostock Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

mbostock Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

Fil Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

Fil Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

Fil Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

mbostock Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

mbostock commented Feb 27, 2021

Uh oh!

mbostock commented Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbostock commented Feb 28, 2021

Uh oh!

mbostock commented Mar 1, 2021

Uh oh!

Fil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mbostock commented Feb 27, 2021 •

edited

Loading

mbostock commented Feb 27, 2021 •

edited

Loading

Fil Feb 27, 2021 •

edited

Loading

mbostock commented Feb 27, 2021 •

edited

Loading