-
Notifications
You must be signed in to change notification settings - Fork 185
layouts: dodge, hexbin #775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here’s a pattern that I just used, if it helps. First, checkout the main branch (make sure you’re up-to-date):
Then create a new, temporary branch:
Then squash merge the existing branch
Commit after resolving differences:
Then switch back to the feature branch:
Reset it to the squashed merge:
Push it up to GitHub:
And lastly clean up the temporary branch:
This workflow will squash any branch down to a single commit, minimizing the number of conflicts you need to resolve. |
Thanks! fwiw the relevant diff with #648 is the following: diff --git a/src/layouts/dodge.js b/src/layouts/dodge.js
index 5e737151..862a9c9d 100644
--- a/src/layouts/dodge.js
+++ b/src/layouts/dodge.js
@@ -1,7 +1,7 @@
import {max} from "d3";
import IntervalTree from "interval-tree-1d";
-import {maybeNumberChannel} from "../options.js";
import {layout} from "./index.js";
+import {finite, positive} from "../defined.js";
const anchorXLeft = ({marginLeft}) => [1, marginLeft];
const anchorXRight = ({width, marginRight}) => [-1, width - marginRight];
@@ -39,43 +39,48 @@ export function dodgeY(dodgeOptions = {}, options = {}) {
}
function dodge(y, x, anchor, padding, options) {
- const [, r] = maybeNumberChannel(options.r, 3);
- return layout(options, (I, scales, {[x]: X, r: R}, dimensions) => {
+ return layout(options, function(index, scales, values, dimensions) {
+ let {[x]: X, [y]: Y, r: R} = values;
+ const r = R ? undefined : this.r !== undefined ? this.r : options.r !== undefined ? +options.r : 3;
if (X == null) throw new Error(`missing channel: ${x}`);
let [ky, ty] = anchor(dimensions);
const compare = ky ? compareAscending : compareSymmetric;
- if (ky) ty += ky * ((R ? max(I, i => R[i]) : r) + padding); else ky = 1;
- if (!R) R = new Float64Array(X.length).fill(r);
- const Y = new Float64Array(X.length);
- const tree = IntervalTree();
- for (const i of I) {
- const intervals = [];
- const l = X[i] - R[i];
- const r = X[i] + R[i];
-
- // For any previously placed circles that may overlap this circle, compute
- // the y-positions that place this circle tangent to these other circles.
- // https://observablehq.com/@mbostock/circle-offset-along-line
- tree.queryInterval(l - padding, r + padding, ([,, j]) => {
- const yj = Y[j];
- const dx = X[i] - X[j];
- const dr = R[i] + padding + R[j];
- const dy = Math.sqrt(dr * dr - dx * dx);
- intervals.push([yj - dy, yj + dy]);
- });
-
- // Find the best y-value where this circle can fit.
- for (let y of intervals.flat().sort(compare)) {
- if (intervals.every(([lo, hi]) => y <= lo || y >= hi)) {
- Y[i] = y;
- break;
+ if (ky) ty += ky * ((R ? max(index.flat(), i => R[i]) : r) + padding); else ky = 1;
+ if (!R) R = values.r = new Float64Array(X.length).fill(r);
+ if (!Y) Y = values[y] = new Float64Array(X.length);
+ for (let I of index) {
+ const tree = IntervalTree();
+ I = I.filter(i => finite(X[i]) && positive(R[i]));
+ for (const i of I) {
+ const intervals = [];
+ const l = X[i] - R[i];
+ const r = X[i] + R[i];
+
+ // For any previously placed circles that may overlap this circle, compute
+ // the y-positions that place this circle tangent to these other circles.
+ // https://observablehq.com/@mbostock/circle-offset-along-line
+ tree.queryInterval(l - padding, r + padding, ([,, j]) => {
+ const yj = Y[j];
+ const dx = X[i] - X[j];
+ const dr = R[i] + padding + R[j];
+ const dy = Math.sqrt(dr * dr - dx * dx);
+ intervals.push([yj - dy, yj + dy]);
+ });
+
+ // Find the best y-value where this circle can fit.
+ for (let y of intervals.flat().sort(compare)) {
+ if (intervals.every(([lo, hi]) => y <= lo || y >= hi)) {
+ Y[i] = y;
+ break;
+ }
}
+
+ // Insert the placed circle into the interval tree.
+ tree.insert([l, r, i]);
}
-
- // Insert the placed circle into the interval tree.
- tree.insert([l, r, i]);
+ for (const i of I) Y[i] = Y[i] * ky + ty;
}
- return {[y]: Y.map(y => y * ky + ty)};
+ return {index, values};
});
}
|
09d164d
to
3971461
Compare
Co-authored-by: Mike Bostock <[email protected]>
3971461
to
d6f750f
Compare
let [ky, ty] = anchor(dimensions); | ||
const compare = ky ? compareAscending : compareSymmetric; | ||
if (ky) ty += ky * ((R ? max(index.flat(), i => R[i]) : r) + padding); else ky = 1; | ||
if (!R) R = values.r = new Float64Array(X.length).fill(r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mutating an input argument (values). Watch out for this! (I’ll fix.)
scaleDescriptors = {...newScaleDescriptors, ...scaleDescriptors}; | ||
scales = ScaleFunctions(scaleDescriptors); | ||
for (const layout of layouts) { | ||
if (layout.channels) layout.values = {...layout.values, ...applyScales(layout.channels, scales)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly this is mutating layout, which was returned by mark.layout; we shouldn’t mutate an object we didn’t create (e.g. the mark might use memoization).
for (const [, channel] of channels) { | ||
const {scale} = channel; | ||
if (scale !== undefined) { | ||
const {percent, transform = percent ? x => x * 100 : undefined} = options[scale] || {}; | ||
if (transform != null) channel.value = Array.from(channel.value, transform); | ||
if (scaleChannels.has(scale)) scaleChannels.get(scale).push(channel); | ||
else scaleChannels.set(scale, [channel]); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicating code from Plot.plot:
Lines 41 to 49 in 352d086
for (const [, channel] of channels) { | |
const {scale} = channel; | |
if (scale !== undefined) { | |
const {percent, transform = percent ? x => x * 100 : undefined} = options[scale] || {}; | |
if (transform != null) channel.value = Array.from(channel.value, transform); | |
if (scaleChannels.has(scale)) scaleChannels.get(scale).push(channel); | |
else scaleChannels.set(scale, [channel]); | |
} | |
} |
const newOptions = {}; | ||
for (let i = 0; i < marks.length; ++i) { | ||
const mark = marks[i]; | ||
const {channels} = layouts[i] = applyLayout.call(mark, mark.layout, [markIndex.get(mark)], scaleDescriptors, markChannels.get(mark), dimensions, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is ignoring the index and values returned by mark.layout. I think I follow how the values are being ignored because the code currently relies on mutation (https://github.com/observablehq/plot/pull/775/files#r818901600). I’m a little surprised that it’s ignoring the index though, here. We don’t ignore the index in the faceting case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, never mind. The layout is stored in layouts and then layout.index is referenced later.
(I’m mostly making notes to myself as I go through this code. You don’t need to respond to any of the comments above but we can talk about them soon!) |
let index = markIndex.get(mark); | ||
if (mark.filter != null) index = mark.filter(index, channels, values); | ||
const node = mark.render(index, scales, values, dimensions, axes); | ||
let {index: [I], values} = layouts[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that layout.index is nested, we should probably call it layout.facets instead of layout.index. We make a similar distinction in that mark.transform returns {data, facets} (rather than {data, index}), and when the mark is not faceted, then facets is [index].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it’s also the case, confusingly, that mark.initialize always returns {index, channels} and index may either be a flat array for non-faceted marks or a nested array for faceted marks. (Though markIndex will always contain a flat index.)
function applyLayout(layout, index, scaleDescriptors, channels, dimensions, options) { | ||
const scales = ScaleFunctions(scaleDescriptors); | ||
const values = channels ? applyScales(channels, scales) : {}; | ||
return layout ? layout.call(this, index, scaleDescriptors, values, dimensions, options) : {index, values}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is passing the entire options object to mark.layout. I don’t think we want to expose that much information to the layout—it risks making the layouts unwieldy because their behavior could be dependent on anything in the plot specification. It looks like this options is only needed for faceting (not by our current hexbin and dodge layouts), so, I suspect we’ll need to find a different way to implement faceted layouts to avoid exposing options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also passing scaleDescriptors to mark.layout, but I think we want scales (i.e., scale functions) for symmetry with mark.render. In any case the scales are not used by the current layouts.
const channels = []; | ||
const newIndex = []; | ||
for (const o of outputs) { | ||
o.initialize(this.data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is referencing mark.data, rather than whatever the possible prior mark.transform returned as data during mark.initialize. Probably this means we’ll need to pass the (possibly transformed) data to the layout rather than expecting layouts to reference this.data.
const {channels} = layouts[i] = applyLayout.call(mark, mark.layout, [markIndex.get(mark)], scaleDescriptors, markChannels.get(mark), dimensions, options); | ||
if (channels) { | ||
for (const [, {scale, options: hint}] of channels) { | ||
if (scale) newOptions[scale] = {...hint, ...options[scale]}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have multiple layouts that define multiple channels for a given scale, or even a single layout that defines multiple channels for the same scale, this won’t compose the scale options; instead, it’s “last one wins”, i.e., the last channel definition for a given scale is respected and the scale options defined by other channels produced by layouts are ignored. Probably we need to accumulate into newOptions. But I’m also curious if we can sidestep this approach entirely by using channel hints rather than letting layouts inject scale options directly.
@@ -279,17 +327,57 @@ class Facet extends Mark { | |||
} | |||
marksChannels.push(markChannels); | |||
} | |||
this.layout = function(index, scaleDescriptors, values, dimensions, options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn’t assign this method as a side-effect of calling facet.initialize; it should be a method on the Facet class.
const newScaleDescriptors = Scales(newScaleChannels, newOptions); | ||
Object.assign(scaleDescriptors, newScaleDescriptors, scaleDescriptors); | ||
const rescales = {...ScaleFunctions(newScaleDescriptors), ...scales}; | ||
for (let i = 0; i < marks.length; ++i) { | ||
marksValues[i] = {...marksValues[i], ...layouts[i].values, ...applyScales(layouts[i].channels || [], rescales)}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code for facets is closely related to this non-faceted code, and it would be nice to unify them somehow:
Lines 117 to 122 in 352d086
const newScaleDescriptors = Scales(newScaleChannels, newOptions); | |
scaleDescriptors = {...newScaleDescriptors, ...scaleDescriptors}; | |
scales = ScaleFunctions(scaleDescriptors); | |
for (const layout of layouts) { | |
if (layout.channels) layout.values = {...layout.values, ...applyScales(layout.channels, scales)}; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is the case for a lot of code, and I've wondered several times if we should not route everything through the faceted code, even when there is no faceting (the non-faceted case being just 1 facet with all the data, and null fx/fy). This might also help to clear up the difference between faceted and non-faceted indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe! I tried before and the current state in main is the best I could come up with, but there’s always room for improvement. 😁 A little code duplication isn’t harmful, especially if it’s not visible externally, and there are usually ways to extract shared logic into shared helper functions to minimize disruption. This was just a reminder to look for a way to reduce the duplication, if possible.
(I’ve broken faceting in my working branch, but I’ll get back to it soon.)
// * applying the scales, possibly pushing a new one (for legends) | ||
// * returning a filtered index and values to render, possibly new channels to rescale | ||
const newScaleChannels = new Map(); | ||
const layouts = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This array overlaps with markIndex: layouts[i].index is equal to markIndex.get(marks[i]) except in the case where the layout returned a new index. We should ideally normalize the state so there are no duplicate (and obsolete or inconsistent) representations.
|
||
function composeLayout(l1, l2) { | ||
return function(index, scales, values, dimensions) { | ||
values = l1.call(this, index, scales, values, dimensions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the redesigned signature, mark.layout returns {index, values, channels}, so we don’t want to assign to values here. Similarly the implementation of partialLayout will need to change.
import {maybeOutputs} from "../transforms/group.js"; | ||
|
||
const defaults = { | ||
ariaLabel: "hex", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaults here mean something different than they do for marks, where they are passed as a separate argument to the Mark constructor: these defaults mean default options. If you specify {ariaLabel: "hex"}, it means that you want an ariaLabel channel derived from the “hex” field (d => d.hex
) of data. When I fixed mark filtering this was causing all the marks to be dropped because the ariaLabel channel values were [undefined, undefined, undefined, …]. Oops! 😄
const newIndex = []; | ||
for (const o of outputs) { | ||
o.initialize(this.data); | ||
o.scope("data", index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call shouldn’t be necessary—calling output.initialize will invoke the reducer:
Lines 177 to 179 in 2b59461
if (reducer.scope === "data") { | |
context = reducer.reduce(range(data), V); | |
} |
Also here index is a nested array (like facets).
const newScaleDescriptors = Scales(newScaleChannels, newOptions); | ||
scaleDescriptors = {...newScaleDescriptors, ...scaleDescriptors}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we’re constructing new scales here, we have to be careful because it’s possible that a layout can define new channels for a scale that already existed and had existing channels. This code only considers the new channels when creating new scales, but there might be some existing channels, too—we need to combine the new and old channels together for each scale that needs to be reconstructed after computing the layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this question I took the view that applying a layout would not recompute the values for the older channels (they can operate on their mark only), and would not be allowed to modify an existing scale once it's been instantiated (in order to avoid any loop if one layout derives color from x, and the other derives x from color). Maybe overly simplistic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it computes the new scale but then it gets overwritten by the old definition because ...scaleDescriptors
is second. I missed that.
I’m proposing instead a new scale is derived from the union of old and new channels, just like if you added another mark to a plot, replacing the old scale definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it’s not quite working like I expect. I wonder what I’m missing. For the hexbinDot example, there’s an old channel associated with the r scale that represents the input to the hexbin:
[3750, 3800, 3250, NaN, 3450, 3650, 3625, 4675, 3475, 4250, …]
And the new channel associated with the r scale represents the output of the hexbin:
[24, 25, 26, 4, 7, 4, 3, 10, 1, 1, 23, …]
The old channel probably shouldn’t be associated with the r scale since it’s unscaled. (The numbers represent the mass of penguins in grams, not radii in pixels.) I need to dig in a bit deeper to figure out what’s going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think the way to think about this is that the r channel associated with the dot mark is getting replaced by the hexbin layout. So we don’t want to union the two channels here—we just want the new channel. But we’ll need to carefully track which channels are getting replaced (same name, same mark) to determine the surviving set of channels associated with each scale.
values[o.name] = o.output.transform(); | ||
} | ||
} | ||
if (!channels.find(([key]) => key === "r")) values.r = Array.from(values.x).fill(radius); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than constructing an r channel when the radius is constant, we can instead populate the r option as a constant up above.
const bins = hbin(I, X, Y, radius); | ||
for (const o of outputs) { | ||
o.scope("facet", I); | ||
for (const bin of bins) o.reduce(bin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are typically many more bins than outputs, so I suspect it’s more efficient to do another for (const o of outputs)
inside the for (const bin of bins)
loop below, rather than another for (const bin of bins)
loop here.
The hexbin layout will probably need an optional group by z, like the bin transform does. This would let you have overlapping hexagons of different colors, for example. Related #197. |
|
||
export function hexbin(outputs, options) { | ||
([outputs, options] = mergeOptions(outputs, options)); | ||
const {radius, ...inputs} = options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both hexbin and hexgrid, I think we’ll want to use r instead of radius for consistency with dot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it’s complicated. We can’t just use r because you might want to use the r channel, e.g.,
Plot.hexbin({r: "sum"}, {x: "culmen_depth_mm", y: "culmen_length_mm", r: "body_mass_g"})
so you need a separate option to specify the size of the hexagonal grid.
At the same time, though, it would really be nice to say:
Plot.hexbin({fill: "count"}, {x: "culmen_depth_mm", y: "culmen_length_mm", r: 20})
Perhaps we can find a way to make both work.
} | ||
|
||
// Allow hexbin options to be specified as part of outputs; merge them into options. | ||
function mergeOptions({radius = 10, ...outputs}, options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default radius of 10 is defined this way, then no default will be provided if options says something like {radius: undefined}.
Something to think about in the case of the hexbin transform: the scales get computed twice. For example when you say this: Plot.dot(penguins, Plot.hexbin({r: "count"}, {x: "culmen_depth_mm", y: "culmen_length_mm", r: "body_mass_g"})) The first r scale is computed given individual observations (e.g., 344 values representing body mass in grams of penguins). Then the hexbin layout produces a new r channel for the hexagonal bins (e.g., 43 values representing counts of observations in each bin). The r scale is then reconstructed using this new channel and the original scale definition is discarded. It’d be nice if somehow we could avoid the construction of the first r scale since it won’t ever be used. And likewise there’s no reason to materialize the scaled r channel values using this first r scale. But we won’t know that the r channel will be replaced until the layout is invoked. And we don’t know what channels the layout might consume. So, I guess it’s unavoidable in our current design, but it’d be slightly better maybe if the layout could declare statically what channels it consumes and produces so that we could avoid unnecessary computation. |
Related to this, the hexbin layout should function more similarly to the bin transform in that it should define a transform option. For example if you look at the transformed data that is generated by the bin transform, you get this (i.e., nested arrays of the input data):
But since the hexbin layout doesn’t similarly define the transform option, the data is unchanged (the 344 rows of penguin observations). This distinction isn’t currently visible, but it will be when it comes to things like interaction, where we might want to e.g. allow brushing over hexbins. |
I mean, we can’t use the existing transform option since that is invoked too soon—for hexbins we need to wait until after the scales have been constructed and applied to x and y. But it means that layouts in effect will need to be a superset of transforms, and that means they need to be able to return new data and not just a new index. |
for (const o of outputs) { | ||
if (o.name in rescales) { | ||
const {scale, options} = rescales[o.name]; | ||
const value = o.output.transform(); | ||
channels.push([o.name, {scale, value, options}]); | ||
} else { | ||
values[o.name] = o.output.transform(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using the “materialized” representation of a channel, similar to that returned by the Channel constructor, rather than the channel “descriptors” that are used e.g. by the Mark constructor and mark.channels. Channel descriptors are materialized inside of the mark.initialize, given data. I wonder if it would be more appropriate to use the “channel descriptor” representation here, rather than having layouts return materialized channels, so as to avoid introducing a new public signature for channel objects.
This is the sort of thing that makes me wish we used TypeScript, since we’d be forced to be explicit about the shapes of these objects. We’ll get there eventually I guess. 😄
Superseded by #801. 😅 |
supersedes #648 #711 #713 #740
There were too many conflicts to rebase on #648. The main difference is that instead of acting only on the values of one facet, a layout now receives the indices of all facets, and must return the index, (scaled) values, and possibly new channels that will be scaled in a second pass. It is allowed to hint at the scale defaults it wants (but they are always overrided if the scale already exists because of another mark).
Fixes #164.