Skip to content

layouts: dodge, hexbin #775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 75 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ When drawing a single mark, you can call *mark*.**plot**(*options*) as shorthand
```js
Plot.barY(alphabet, {x: "letter", y: "frequency"}).plot()
```
### Layout options
### Geometry options

These options determine the overall layout of the plot; all are specified as numbers in pixels:
These options determine the overall geometry of the plot; all are specified as numbers in pixels:

* **marginTop** - the top margin
* **marginRight** - the right margin
Expand Down Expand Up @@ -948,6 +948,14 @@ Plot.dotY(cars.map(d => d["economy (mpg)"]))

Equivalent to [Plot.dot](#plotdotdata-options) except that if the **y** option is not specified, it defaults to the identity function and assumes that *data* = [*y₀*, *y₁*, *y₂*, …].

### Hexgrid

The hexgrid mark can be used to support marks using the [hexbin](#hexbin) layout.

#### Plot.hexgrid([*options*])

The *radius* option specifies the radius of the hexagonal mesh, in pixels (defaults to 10). The *clip* option is set, by default, to clip the mark to the frame’s dimensions.

### Image

[<img src="./img/image.png" width="320" height="198" alt="a scatterplot of Presidential portraits">](https://observablehq.com/@observablehq/plot-image)
Expand Down Expand Up @@ -1435,10 +1443,11 @@ The following aggregation methods are supported:
* *pXX* - the percentile value, where XX is a number in [00,99]
* *deviation* - the standard deviation
* *variance* - the variance per [Welford’s algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm)
* *x* - the middle the bin’s *x*-extent (when binning on *x*)
* *mode* - the value with the most occurrences
* *x* - the middle of the bin’s *x*-extent (when binning on *x*)
* *x1* - the lower bound of the bin’s *x*-extent (when binning on *x*)
* *x2* - the upper bound of the bin’s *x*-extent (when binning on *x*)
* *y* - the middle the bin’s *y*-extent (when binning on *y*)
* *y* - the middle of the bin’s *y*-extent (when binning on *y*)
* *y1* - the lower bound of the bin’s *y*-extent (when binning on *y*)
* *y2* - the upper bound of the bin’s *y*-extent (when binning on *y*)
* a function to be passed the array of values for each bin and the extent of the bin
Expand Down Expand Up @@ -1937,6 +1946,68 @@ This helper for constructing derived channels returns a [*channel*, *setChannel*

Plot.channel is typically used by options transforms to define new channels; these channels are populated (derived) when the custom *transform* function is invoked.

## Layouts

A layout processes the transformed and scaled values of a mark before rendering. A layout might, for example, modify the marks’ positions to avoid occlusion. A layout operates in representation space (such as pixels and colors, *i.e.* after scales have been applied) rather than data space.

### Dodge

The dodge layout can be applied to any mark that consumes *x* or *y*, such as the Dot, Image, Text and Vector marks.

#### Plot.dodgeY([*layoutOptions*, ]*options*)

```js
Plot.dodgeY({x: "date"})
```

If the marks are arranged along the *x* axis, the dodgeY layout piles them vertically, keeping their *x* position unchanged, and creating a *y* position that avoids overlapping.

#### Plot.dodgeX([*layoutOptions*, ]*options*)

```js
Plot.dodgeX({y: "value"})
```

Equivalent to Plot.dodgeY, but the piling is horizontal, keeping the marks’ *y* position unchanged, and creating an *x* position that avoids overlapping.

The dodge layouts accept the following layout options:

* **padding** — a number of pixels added to the radius of the mark to estimate its size
* **anchor** - the layout’s anchor: one of *middle*, *right*, and *left* (default) for dodgeX, and one of *middle*, *top*, and *bottom* (default) for dodgeY.

### Hexbin

The hexbin layout can be applied to any mark that consumes *x* and *y*, such as the Dot, Image, Text and Vector marks. It aggregates the values into hexagonal bins of the given *radius* (in pixel space), and computes new values *x* and *y* as the centers of each bin. It can also return new channels by applying a reducer to each bin, such as the number of elements in the bin.

#### Plot.hexbin(*outputs*, *options*)

[Source](./src/layouts/hexbin.js) · [Examples](https://observablehq.com/@observablehq/plot-hexbin) · Aggregates the given inputs into hexagonal bins, and creates output channels with the reduced data. The options must specify the *x* and *y* channels, and can indicate the *radius* in pixels of the hexagonal lattice (defaults to 10). The *outputs* options are similar to Plot.bin’s outputs; each output channel receives as input, for each hexagon, the subset of the data which has been matched to its center. The outputs object specifies the aggregation method for each output channel.The following aggregation methods are supported:

* *first* - the first value, in input order
* *last* - the last value, in input order
* *count* - the number of elements (frequency)
* *distinct* - the number of distinct values
* *sum* - the sum of values
* *proportion* - the sum proportional to the overall total (weighted frequency)
* *proportion-facet* - the sum proportional to the facet total
* *min* - the minimum value
* *min-index* - the zero-based index of the minimum value
* *max* - the maximum value
* *max-index* - the zero-based index of the maximum value
* *mean* - the mean value (average)
* *median* - the median value
* *deviation* - the standard deviation
* *variance* - the variance per [Welford’s algorithm](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm)
* *mode* - the value with the most occurrences
* a function to be passed the array of values for each bin and the extent of the bin
* an object with a *reduce* method

See also the [hexgrid](#hexgrid) mark.

### Custom layouts

When its *options* have a *layout* property, the layout function is called after the data has been faceted and scaled; it receives as inputs the index of the elements to layout, the scales descriptors, the values (the scaled channels as a key: array object), the dimensions, and the mark as this. It must return the index, values, and the channels that need to be scaled in a second pass.

## Curves

A curve defines how to turn a discrete representation of a line as a sequence of points [[*x₀*, *y₀*], [*x₁*, *y₁*], [*x₂*, *y₂*], …] into a continuous path; *i.e.*, how to interpolate between points. Curves are used by the [line](#line), [area](#area), and [link](#link) mark, and are implemented by [d3-shape](https://github.com/d3/d3-shape/blob/master/README.md#curves).
Expand Down
2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
},
"sideEffects": false,
"devDependencies": {
"@rollup/plugin-commonjs": "^21.0.1",
"@rollup/plugin-json": "4",
"@rollup/plugin-node-resolve": "13",
"canvas": "2",
Expand All @@ -50,6 +51,7 @@
},
"dependencies": {
"d3": "^7.3.0",
"interval-tree-1d": "1",
"isoformat": "0.2"
},
"engines": {
Expand Down
2 changes: 2 additions & 0 deletions rollup.config.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import fs from "fs";
import {terser} from "rollup-plugin-terser";
import commonjs from "@rollup/plugin-commonjs";
import json from "@rollup/plugin-json";
import node from "@rollup/plugin-node-resolve";
import * as meta from "./package.json";
Expand All @@ -25,6 +26,7 @@ const config = {
banner: `// ${meta.name} v${meta.version} Copyright ${copyrights.join(", ")}`
},
plugins: [
commonjs(),
json(),
node()
]
Expand Down
3 changes: 3 additions & 0 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export {boxX, boxY} from "./marks/box.js";
export {Cell, cell, cellX, cellY} from "./marks/cell.js";
export {Dot, dot, dotX, dotY} from "./marks/dot.js";
export {Frame, frame} from "./marks/frame.js";
export {Hexgrid, hexgrid} from "./marks/hexgrid.js";
export {Image, image} from "./marks/image.js";
export {Line, line, lineX, lineY} from "./marks/line.js";
export {Link, link} from "./marks/link.js";
Expand All @@ -23,6 +24,8 @@ export {map, mapX, mapY} from "./transforms/map.js";
export {window, windowX, windowY} from "./transforms/window.js";
export {select, selectFirst, selectLast, selectMaxX, selectMaxY, selectMinX, selectMinY} from "./transforms/select.js";
export {stackX, stackX1, stackX2, stackY, stackY1, stackY2} from "./transforms/stack.js";
export {dodgeX, dodgeY} from "./layouts/dodge.js";
export {hexbin} from "./layouts/hexbin.js";
export {formatIsoDate, formatWeekday, formatMonth} from "./format.js";
export {scale} from "./scales.js";
export {legend} from "./legends.js";
93 changes: 93 additions & 0 deletions src/layouts/dodge.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import {max} from "d3";
import IntervalTree from "interval-tree-1d";
import {layout} from "./index.js";
import {finite, positive} from "../defined.js";

const anchorXLeft = ({marginLeft}) => [1, marginLeft];
const anchorXRight = ({width, marginRight}) => [-1, width - marginRight];
const anchorXMiddle = ({width, marginLeft, marginRight}) => [0, (marginLeft + width - marginRight) / 2];
const anchorYTop = ({marginTop}) => [1, marginTop];
const anchorYBottom = ({height, marginBottom}) => [-1, height - marginBottom];
const anchorYMiddle = ({height, marginTop, marginBottom}) => [0, (marginTop + height - marginBottom) / 2];

function maybeAnchor(anchor) {
return typeof anchor === "string" ? {anchor} : anchor;
}

export function dodgeX(dodgeOptions = {}, options = {}) {
if (arguments.length === 1) [options, dodgeOptions] = [dodgeOptions, options];
let {anchor = "left", padding = 1} = maybeAnchor(dodgeOptions);
switch (`${anchor}`.toLowerCase()) {
case "left": anchor = anchorXLeft; break;
case "right": anchor = anchorXRight; break;
case "middle": anchor = anchorXMiddle; break;
default: throw new Error(`unknown dodge anchor: ${anchor}`);
}
return dodge("x", "y", anchor, +padding, options);
}

export function dodgeY(dodgeOptions = {}, options = {}) {
if (arguments.length === 1) [options, dodgeOptions] = [dodgeOptions, options];
let {anchor = "bottom", padding = 1} = maybeAnchor(dodgeOptions);
switch (`${anchor}`.toLowerCase()) {
case "top": anchor = anchorYTop; break;
case "bottom": anchor = anchorYBottom; break;
case "middle": anchor = anchorYMiddle; break;
default: throw new Error(`unknown dodge anchor: ${anchor}`);
}
return dodge("y", "x", anchor, +padding, options);
}

function dodge(y, x, anchor, padding, options) {
return layout(options, function(index, scales, values, dimensions) {
let {[x]: X, [y]: Y, r: R} = values;
const r = R ? undefined : this.r !== undefined ? this.r : options.r !== undefined ? +options.r : 3;
if (X == null) throw new Error(`missing channel: ${x}`);
let [ky, ty] = anchor(dimensions);
const compare = ky ? compareAscending : compareSymmetric;
if (ky) ty += ky * ((R ? max(index.flat(), i => R[i]) : r) + padding); else ky = 1;
if (!R) R = values.r = new Float64Array(X.length).fill(r);
Copy link
Member

@mbostock mbostock Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mutating an input argument (values). Watch out for this! (I’ll fix.)

if (!Y) Y = values[y] = new Float64Array(X.length);
for (let I of index) {
const tree = IntervalTree();
I = I.filter(i => finite(X[i]) && positive(R[i]));
for (const i of I) {
const intervals = [];
const l = X[i] - R[i];
const r = X[i] + R[i];

// For any previously placed circles that may overlap this circle, compute
// the y-positions that place this circle tangent to these other circles.
// https://observablehq.com/@mbostock/circle-offset-along-line
tree.queryInterval(l - padding, r + padding, ([,, j]) => {
const yj = Y[j];
const dx = X[i] - X[j];
const dr = R[i] + padding + R[j];
const dy = Math.sqrt(dr * dr - dx * dx);
intervals.push([yj - dy, yj + dy]);
});

// Find the best y-value where this circle can fit.
for (let y of intervals.flat().sort(compare)) {
if (intervals.every(([lo, hi]) => y <= lo || y >= hi)) {
Y[i] = y;
break;
}
}

// Insert the placed circle into the interval tree.
tree.insert([l, r, i]);
}
for (const i of I) Y[i] = Y[i] * ky + ty;
}
return {index, values};
});
}

function compareSymmetric(a, b) {
return Math.abs(a) - Math.abs(b);
}

function compareAscending(a, b) {
return (a < 0) - (b < 0) || (a - b);
}
112 changes: 112 additions & 0 deletions src/layouts/hexbin.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
import {groups} from "d3";
import {layout} from "./index.js";
import {basic} from "../transforms/basic.js";
import {maybeOutputs} from "../transforms/group.js";

const defaults = {
ariaLabel: "hex",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults here mean something different than they do for marks, where they are passed as a separate argument to the Mark constructor: these defaults mean default options. If you specify {ariaLabel: "hex"}, it means that you want an ariaLabel channel derived from the “hex” field (d => d.hex) of data. When I fixed mark filtering this was causing all the marks to be dropped because the ariaLabel channel values were [undefined, undefined, undefined, …]. Oops! 😄

symbol: "hexagon"
};

// width factor (allows the hexbin transform to work with circular dots!)
const w0 = Math.sin(Math.PI / 3);

function hbin(I, X, Y, r) {
const dx = r * 2 * w0;
const dy = r * 1.5;
const keys = new Map();
return groups(I, i => {
let px = X[i] / dx;
let py = Y[i] / dy;
if (isNaN(px) || isNaN(py)) return;
let pj = Math.round(py),
pi = Math.round(px = px - (pj & 1) / 2),
py1 = py - pj;
if (Math.abs(py1) * 3 > 1) {
let px1 = px - pi,
pi2 = pi + (px < pi ? -1 : 1) / 2,
pj2 = pj + (py < pj ? -1 : 1),
px2 = px - pi2,
py2 = py - pj2;
if (px1 * px1 + py1 * py1 > px2 * px2 + py2 * py2) pi = pi2 + (pj & 1 ? 1 : -1) / 2, pj = pj2;
}
const key = `${pi}|${pj}`;
keys.set(key, [pi, pj]);
return key;
})
.filter(([p]) => p)
.map(([p, bin]) => {
const [pi, pj] = keys.get(p);
bin.x = (pi + (pj & 1) / 2) * dx;
bin.y = pj * dy;
return bin;
});
}

// Allow hexbin options to be specified as part of outputs; merge them into options.
function mergeOptions({radius = 10, ...outputs}, options) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default radius of 10 is defined this way, then no default will be provided if options says something like {radius: undefined}.

return [outputs, {radius, ...options}];
}

function hexbinLayout(radius, outputs, options) {
// we defer to Plot.bin’s reducers, but some of them are not supported
for (const reduce of Object.values(outputs)) {
if (typeof reduce === "string"
&& !reduce.match(/^(first|last|count|distinct|sum|deviation|min|min-index|max|max-index|mean|median|variance|mode|proportion|proportion-facet)$/i))
throw new Error(`invalid reduce ${reduce}`);
}
outputs = maybeOutputs(outputs, options);
const rescales = {
r: {scale: "r", options: {range: [0, radius * w0]}},
fill: {scale: "color"},
stroke: {scale: "color"},
fillOpacity: {scale: "opacity"},
strokeOpacity: {scale: "opacity"},
symbol: {scale: "symbol"}
};
const {x, y} = options;
if (x == null) throw new Error("missing channel: x");
if (y == null) throw new Error("missing channel: y");
return layout({...defaults, ...options}, function(index, scales, {x: X, y: Y}) {
const values = {x: [], y: [], r: []};
const channels = [];
const newIndex = [];
for (const o of outputs) {
o.initialize(this.data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is referencing mark.data, rather than whatever the possible prior mark.transform returned as data during mark.initialize. Probably this means we’ll need to pass the (possibly transformed) data to the layout rather than expecting layouts to reference this.data.

o.scope("data", index);
Copy link
Member

@mbostock mbostock Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call shouldn’t be necessary—calling output.initialize will invoke the reducer:

if (reducer.scope === "data") {
context = reducer.reduce(range(data), V);
}

Also here index is a nested array (like facets).

}
let n = 0;
for (const I of index) {
const facetIndex = [];
newIndex.push(facetIndex);
const bins = hbin(I, X, Y, radius);
for (const o of outputs) {
o.scope("facet", I);
for (const bin of bins) o.reduce(bin);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are typically many more bins than outputs, so I suspect it’s more efficient to do another for (const o of outputs) inside the for (const bin of bins) loop below, rather than another for (const bin of bins) loop here.

}
for (const bin of bins) {
values.x[n] = bin.x;
values.y[n] = bin.y;
facetIndex.push(n++);
}
}
for (const o of outputs) {
if (o.name in rescales) {
const {scale, options} = rescales[o.name];
const value = o.output.transform();
channels.push([o.name, {scale, value, options}]);
} else {
values[o.name] = o.output.transform();
}
}
Comment on lines +93 to +101
Copy link
Member

@mbostock mbostock Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the “materialized” representation of a channel, similar to that returned by the Channel constructor, rather than the channel “descriptors” that are used e.g. by the Mark constructor and mark.channels. Channel descriptors are materialized inside of the mark.initialize, given data. I wonder if it would be more appropriate to use the “channel descriptor” representation here, rather than having layouts return materialized channels, so as to avoid introducing a new public signature for channel objects.

This is the sort of thing that makes me wish we used TypeScript, since we’d be forced to be explicit about the shapes of these objects. We’ll get there eventually I guess. 😄

if (!channels.find(([key]) => key === "r")) values.r = Array.from(values.x).fill(radius);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than constructing an r channel when the radius is constant, we can instead populate the r option as a constant up above.


return {index: newIndex, values, channels};
});
}

export function hexbin(outputs, options) {
([outputs, options] = mergeOptions(outputs, options));
const {radius, ...inputs} = options;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both hexbin and hexgrid, I think we’ll want to use r instead of radius for consistency with dot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it’s complicated. We can’t just use r because you might want to use the r channel, e.g.,

Plot.hexbin({r: "sum"}, {x: "culmen_depth_mm", y: "culmen_length_mm", r: "body_mass_g"})

so you need a separate option to specify the size of the hexagonal grid.

At the same time, though, it would really be nice to say:

Plot.hexbin({fill: "count"}, {x: "culmen_depth_mm", y: "culmen_length_mm", r: 20})

Perhaps we can find a way to make both work.

return basic(hexbinLayout(radius, outputs, inputs));
}
19 changes: 19 additions & 0 deletions src/layouts/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
export function layout({layout: layout1, ...options}, layout2) {
if (layout2 == null) throw new Error("invalid layout");
layout2 = partialLayout(layout2);
if (layout1 != null) layout2 = composeLayout(layout1, layout2);
return {...options, layout: layout2};
}

function composeLayout(l1, l2) {
return function(index, scales, values, dimensions) {
values = l1.call(this, index, scales, values, dimensions);
Copy link
Member

@mbostock mbostock Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the redesigned signature, mark.layout returns {index, values, channels}, so we don’t want to assign to values here. Similarly the implementation of partialLayout will need to change.

return l2.call(this, index, scales, values, dimensions);
};
}

function partialLayout(l) {
return function(index, scales, values, dimensions) {
return {...values, ...l.call(this, index, scales, values, dimensions)};
};
}
3 changes: 2 additions & 1 deletion src/marks/dot.js
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
import {create, path, symbolCircle} from "d3";
import {positive} from "../defined.js";
import {identity, maybeFrameAnchor, maybeNumberChannel, maybeSymbolChannel, maybeTuple} from "../options.js";
import {identity, maybeFrameAnchor, maybeNumberChannel, maybeTuple} from "../options.js";
import {Mark} from "../plot.js";
import {applyChannelStyles, applyDirectStyles, applyFrameAnchor, applyIndirectStyles, applyTransform, offset} from "../style.js";
import {maybeSymbolChannel} from "../symbols.js";

const defaults = {
ariaLabel: "dot",
Expand Down
Loading