Auto mark: never render rect; move zero-ness to autoSpec #1368

tophtucker · 2023-03-21T23:00:39Z

Still pretty rough… head's spinning a bit from all the combinations.

Fixes #1340 by preferring rectY over rect… but that messes with our zero-ness heuristic, which depended on the inferred mark implementation. Moved that logic to autoSpec but haven't fixed it in the heatmap case yet.

Fixes #1365 by partially reverting 6eab4f2 's changes to how we decide the mark type.

To-do:

Fix heatmap issue: only default to zero-ness if the bar is "standing on" the baseline, i.e. doesn't have both x1 & x2 or y1 & y2 set
Materialize less often?
Fix tests

Questions:

Should setting zero change which mark renders?
- In the autoBarZero, zero was being used as a cue that we could draw bars instead of dots. I wonder if that's the right heuristic. The more salient thing about that dataset might be that there's exactly one data point per domain value?
- Should setting zero change a line to an area? Areas' guarantee of a meaningful zero do distinguish them from lines, but lines could also have a meaningful zero. Area implies zero but zero doesn't imply area.
- If someone has a temperature chart, they probably only expect setting zero to add a line. But maybe you shouldn't use zero for that?
- Overall, it feels too "clever" and too surprising for setting zero to change the mark. But doesn't it feel wrong that you can't automatically get a bar chart from the alphabet dataset?

Current broken tests

Definitely gotta fix the heatmaps. The alphabet example I'm not sure about; maybe people should just have to specify bar there. The mean zero going back to being a line feels OK.

Before	After

…ined

src/marks/auto.js

mbostock · 2023-03-21T23:53:54Z

src/marks/auto.js

@@ -75,19 +75,43 @@ export function autoSpec(data, options) {
        : null;
  }

+  // TODO: should we just always materialize in here?


No, we should materialize only when necessary; this should be moved into conditional checks below. (But there’s no rush to do that now while we’re still figuring out the heuristic.)

Co-authored-by: Mike Bostock <[email protected]>

…into toph/never-rect

tophtucker · 2023-03-22T03:05:41Z

The heatmap examples are now fixed by checking !colorReduce. Would that ever fail? I mean, culmen length and body mass both have meaningful zeros, but I still don't expect or want the baselines there, because the values aren't encoded by the length of the bars, only their position.

Binning / grouping should never change the zero-ness of anything, right? In a histogram, the binning is on the other dimension from the one with the meaningful zero.

In this PR we now have two places where we assert zero-ness: before inferring the mark type, and after. I'm trying to think through whether that makes sense. Before, we check only if a zero reducer has been applied. After, we check if we've picked a mark that has to "stand on" a baseline. That feels sorta reasonable?

For the autoBarZero test (the simple bar chart of alphabet), here's a demo of an alternate heuristic that prefers bars when there's one data point for each value in an ordinal domain. (But really it should probably depend on that and zero-ness.) https://observablehq.com/d/ac53225f9c7e967b

tophtucker · 2023-03-22T23:32:44Z

Paired with Mike. To move the zero-ness determination into autoSpec (which would move it above the determination of mark and transform implementations), we started down the road of mirroring the logic for deciding the mark and transform implementations, since zero-ness depends on that. But that started to feel like a boondoggle; we're going to refactor a bit, but the sequence of the logic will stay more similar to how it is now.

Here's my current understanding of how it should work:

The mark type (line, bar, area…) depends only on the chosen fields and reducers, never on zero-ness. Picking a zero reducer (count, sum…) may affect the mark type, but passing in zero: true won't.
The mark implementation (barX, barY, ...) may depend on zero-ness, because, when there are three channels we need to fill (either x, y1, and y2 or y, x1, and x2) but only two channels specified in options (x and y), we need to infer whether x1 should be inferred to be 0, or y1 should be inferred to be 0.
- TODO: Should this really depend on the zero option the user passes in, or does it only depend on if you've specified a zero reducer on that dimension?
The transform implementation does not depend on zero-ness.
The zero-ness, if still undefined, is chosen based on the mark and transform implementations. A dimension is zero-ful if it's defined, we're not binning along that dimension, and the mark is a bar, area, rect, or rule extending continuously along that dimension. At that point it affects only whether a baseline is drawn.

And here's a sketch of the refactoring. The public interfaces for autoSpec and auto won't change, but the bulk of the logic — formerly all held in auto, then split into auto and autoSpec — will be re-consolidated in a new private function, autoImpl.

// expose information about what would be done; don't instantiate
export function autoSpec(data, options) {
    const {x, y, fx, fy, color, size, mark} = autoImpl(data, options);
    return {x, y, fx, fy, color, size, mark};
}

// decide mark and transform implementations and anything that was undefined in options; 
// everything but instantiating
function autoImpl(data, options) {
    // most of the auto logic gets re-consolidated here
    return {...options, markImpl, markOptions, transformImpl, transformOptions}
}

// actually instantiate; just the last few lines of today's auto
export function auto(data, options) {
    const {markImpl, markOptions, transformImpl, transformOptions, xZero, yZero, fx, fy, colorMode} 
      = autoImpl(data, options)
    return marks(/* ... */);
}

tophtucker · 2023-03-23T04:32:27Z

OK, the only remaining two snapshot test failures are deliberate consequences of the change in philosophy to say that mark type should never depend on zero-ness. I've updated the tests so they won't fail any more, and updated the names: autoBarZero → autoDotZero; autoBarMeanZero → autoLineMeanZero.

Code	Before	After
`alphabet, {x: {value: "frequency", zero: true}, y: "letter"}`
`weather, {x: "date", y: {value: "temp_max", reduce: "mean", zero: true}}`

For the second one, it was already a bit of a toss-up which way we'd want to see. The first one's a bit of a shame; it feels a little weird that you can never get a simple bar chart from auto except by asking for it. 🤷

(Philosophical aside! I think part of the auto philosophy is that we should be able to infer mark types from information about the data. It feels nice to me to describe that property of the data that makes it deserve the mark, rather than just asking for the mark; it moves the location of human input upstream, so that we could potentially make other decisions based on it. The human annotation of zero-ness could also inform color scales, whereas the human annotation of bar-ness doesn't generalize so well. Better yet, if zero-ness were part of the table schema rather than part of the chart config, it could produce better defaults for all charts of that data. That feels like a good academic topic: what sorts of stronger types could we annotate numbers with to produce better default displays?)

I had my li’l auto matrix test, which works nicely in the Vite live preview:

export async function autoMatrix() {
  return htl.html`<div style="display: flex; flex-wrap: wrap;">
    <style>svg { width: 320px; }</style>
    ${await autoHistogram()}
    ${await autoDotZero()}
    ${await autoLineZero()}
    ${/* etc */}
  </div>`
}

But when I run yarn test I get:

  1) plot autoMatrix:
     TypeError: Cannot read properties of null (reading 'replace')
      at reindexStyle (file:///Users/toph/Development/plot/test/plot.js:74:62)
      at file:///Users/toph/Development/plot/test/plot.js:16:5
      at async Context.<anonymous> (file:///Users/toph/Development/plot/test/jsdom.js:29:14)

Because test/plot.js expects every top-level node to be a plot. If you think it'd be valuable to have I can try to make that work, but for now I took it out.

tophtucker · 2023-03-23T04:34:52Z

src/marks/auto.js

+  // Greedily materialize columns for type inference; we’ll need them anyway to
+  // plot! Note that we don’t apply any type inference to the fx and fy
+  // channels, if present; these are always ordinal (at least for now).
+  const {x, y, color, size} = options;
+  const X = materializeValue(data, x);
+  const Y = materializeValue(data, y);
+  const C = materializeValue(data, color);
+  const S = materializeValue(data, size);


moved here from top of auto, which it was passing into autoSpec

I think we still want to do this materialization (also) inside of auto, so that it doesn’t need to be done twice (the second time being when we call auto.plot).

Ooh right, we're not passing these materialized values back out of autoImpl… and we don't want to, because autoSpec shouldn't return the materialized values. So I guess auto should greedily materialize and autoImpl shouldn't bc autoSpec should return something cleaner if possible?

I was wrong; it doesn’t materialize twice because autoImpl returns the full markOptions with the already-materialized values. I think we’re good as-is. Nice!

tophtucker · 2023-03-23T04:36:16Z

src/marks/auto.js

  if (xZero === undefined)
-    xZero = X && transform !== binX && (mark === barX || mark === areaX || mark === rectX || mark === ruleY);
+    xZero =
+      X &&
+      transform !== bin &&
+      transform !== binX &&
+      (markImpl === barX || markImpl === areaX || markImpl === rectX || markImpl === ruleY);


I added the transform !== bin check for both xZero and yZero, which I think achieves the same thing as checking !colorReduce && !sizeReduce in the earlier implementation, i.e. it fixes the heatmap case.

tophtucker · 2023-03-23T04:36:44Z

src/marks/auto.js

@@ -211,23 +164,78 @@ export function auto(data, options) {
  if (transform) {
    if (transform === bin || transform === binX) markOptions.x = {value: X, ...xOptions};
    if (transform === bin || transform === binY) markOptions.y = {value: Y, ...yOptions};
-    markOptions = transform(transformOptions, markOptions);


Moved down into auto bc it's instantiating

mbostock · 2023-03-23T04:43:04Z

src/marks/auto.js

+    x: {
+      value: xValue ?? null,
+      reduce: xReduce ?? null,
+      zero: xZero ?? false,


Might as well coerce the input to a boolean here, in case the user passed in something like 0?

Suggested change

zero: xZero ?? false,

zero: !!xZero,

mbostock · 2023-03-23T04:43:18Z

src/marks/auto.js

+    y: {
+      value: yValue ?? null,
+      reduce: yReduce ?? null,
+      zero: yZero ?? false,


Same.

Suggested change

zero: yZero ?? false,

zero: !!yZero,

mbostock · 2023-03-23T04:48:36Z

src/marks/auto.js

+    colorMode
+  } = autoImpl(data, options);
+
+  if (transform) markOptions = transform(transformOptions, markOptions);


We could fold this into autoImpl and then autoImpl still only needs to return markImpl and markOptions—not also transformImpl and transformOptions. Not necessary, though… 🤔

yeah i wasn't sure if there was some explicit benefit to not instantiating — like, if it could potentially have side effects, or be slower. also might depend on whether we'd want autoSpec to be able to report any info on transforms?

mbostock

Nice work! 👏👏

Fil · 2023-03-23T07:52:37Z

To fix the autoMatrix test we just need this tiny change:

--- a/test/plot.js
+++ b/test/plot.js
@@ -71,7 +71,7 @@ function reindexStyle(root) {
     const parent = style.parentNode;
     const uid = parent.getAttribute("class");
     for (const child of [parent, ...parent.querySelectorAll("[class]")]) {
-      child.setAttribute("class", child.getAttribute("class").replace(new RegExp(`\\b${uid}\\b`, "g"), name));
+      child.setAttribute("class", child.getAttribute("class")?.replace(new RegExp(`\\b${uid}\\b`, "g"), name));
     }
     style.textContent = style.textContent.replace(new RegExp(`[.]${uid}`, "g"), `.${name}`);
   }

tophtucker · 2023-03-23T13:56:32Z

src/marks/auto.js

-      transform !== bin &&
-      transform !== binX &&
+      !(transformImpl === bin || transformImpl === binX) &&


Is the point of this change that they're mutually exclusive and !(a || b) can short-circuit faster than !a && !b or something?

I just found it more readable by grouping the related checks.

tophtucker · 2023-03-23T14:00:42Z

To fix the autoMatrix test we just need this tiny change

Oooh, easy, thanks Fil! I'm gonna make that a separate PR because I also wanna think about which plots should actually be in the matrix and whether it makes sense to have that kind of redundancy. (Feels a little weird that the matrix is useful for looking at the live results, but doesn't add any coverage for the automated testing.)

And thanks Mike for lots of good lil edits! 🙏

mbostock · 2023-03-23T14:41:53Z

I don’t think we should add the redundant matrix test. That was just an idea for debugging this problem.

…q#1368) * Auto mark: never render rect; move zero-ness to autoSpec * update test artifacts * fix some tests; only set zero on a dimension if that dimension is defined * Update src/marks/auto.js Co-authored-by: Mike Bostock <[email protected]> * dont set zero-ness if colorReduce * fix some tests * prettier * just committing the state after pairing so i have it * revert auto file * re-fix the original motivating bugs, i think * autoImpl * rm autoplot matrix test bc it didnt work with test runner * transformImpl; coerce zero; sort imports; const * normalize mark option --------- Co-authored-by: Mike Bostock <[email protected]>

tophtucker added 3 commits March 21, 2023 18:59

Auto mark: never render rect; move zero-ness to autoSpec

261594c

update test artifacts

f7d100f

fix some tests; only set zero on a dimension if that dimension is def…

6ac8928

…ined

mbostock reviewed Mar 21, 2023

View reviewed changes

src/marks/auto.js Outdated Show resolved Hide resolved

mbostock reviewed Mar 21, 2023

View reviewed changes

tophtucker and others added 4 commits March 21, 2023 21:06

Update src/marks/auto.js

3a375ba

Co-authored-by: Mike Bostock <[email protected]>

dont set zero-ness if colorReduce

9f5b78d

Merge branch 'toph/never-rect' of https://github.com/observablehq/plot …

81b5c17

…into toph/never-rect

fix some tests

2907f2f

prettier

fcce38b

tophtucker added 5 commits March 22, 2023 19:42

just committing the state after pairing so i have it

0b3efd3

revert auto file

31ba146

re-fix the original motivating bugs, i think

455aeeb

autoImpl

4c970d0

rm autoplot matrix test bc it didnt work with test runner

1ab57ea

tophtucker marked this pull request as ready for review March 23, 2023 04:33

tophtucker commented Mar 23, 2023

View reviewed changes

mbostock reviewed Mar 23, 2023

View reviewed changes

mbostock added 2 commits March 22, 2023 22:00

transformImpl; coerce zero; sort imports; const

d66d501

normalize mark option

d5b1339

mbostock approved these changes Mar 23, 2023

View reviewed changes

tophtucker commented Mar 23, 2023

View reviewed changes

tophtucker merged commit 640e3f9 into main Mar 23, 2023

tophtucker deleted the toph/never-rect branch March 23, 2023 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto mark: never render rect; move zero-ness to autoSpec #1368

Auto mark: never render rect; move zero-ness to autoSpec #1368

tophtucker commented Mar 21, 2023 •

edited

Loading

mbostock Mar 21, 2023

tophtucker commented Mar 22, 2023

tophtucker commented Mar 22, 2023 •

edited

Loading

tophtucker commented Mar 23, 2023

tophtucker Mar 23, 2023

mbostock Mar 23, 2023

tophtucker Mar 23, 2023

mbostock Mar 23, 2023

tophtucker Mar 23, 2023

tophtucker Mar 23, 2023

mbostock Mar 23, 2023

mbostock Mar 23, 2023

mbostock Mar 23, 2023

tophtucker Mar 23, 2023

mbostock left a comment

Fil commented Mar 23, 2023

tophtucker Mar 23, 2023

mbostock Mar 23, 2023

tophtucker commented Mar 23, 2023 •

edited

Loading

mbostock commented Mar 23, 2023

Auto mark: never render rect; move zero-ness to autoSpec #1368

Auto mark: never render rect; move zero-ness to autoSpec #1368

Conversation

tophtucker commented Mar 21, 2023 • edited Loading

Current broken tests

Choose a reason for hiding this comment

tophtucker commented Mar 22, 2023

tophtucker commented Mar 22, 2023 • edited Loading

tophtucker commented Mar 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbostock left a comment

Choose a reason for hiding this comment

Fil commented Mar 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tophtucker commented Mar 23, 2023 • edited Loading

mbostock commented Mar 23, 2023

tophtucker commented Mar 21, 2023 •

edited

Loading

tophtucker commented Mar 22, 2023 •

edited

Loading

tophtucker commented Mar 23, 2023 •

edited

Loading