Skip to content

dodge (beeswarms) #648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

dodge (beeswarms) #648

wants to merge 6 commits into from

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Jan 4, 2022

Inspired by @yurivish’s Building a Better Beeswarm and @jtrim-ons’s non-force-directed beeswarms, but using interval-tree-1d for simpler (and hopefully faster?) intersection testing.

Screen Shot 2022-01-03 at 7 49 46 PM

Plot.plot({
  width: 764,
  height: 400,
  color: {
    scheme: "purd",
    range: [0.15, 0.95]
  },
  r: {
    type: "linear",
    range: [0, 20]
  },
  marks: [
    Plot.dot(dataset, Plot.dodgeY({x: "x", r: "r", fill: "r"}))
  ]
})

Screen Shot 2022-01-03 at 7 49 01 PM

Plot.plot({
  width: 1152,
  height: 400,
  x: {
    domain: [-1, 0],
    grid: true,
    clamp: true
  },
  r: {
    range: [0, 24]
  },
  marks: [
    Plot.dot(mobility, Plot.dodgeY("middle", {
      filter: d => d.in_lockdown === "TRUE",
      sort: "pop",
      reverse: true,
      x: "pct_change",
      r: "pop",
      fill: "currentColor"
    }))
  ]
})

Demo: https://observablehq.com/d/2055ab012188f381

Still to figure out:

  • How to specify the Rsep (separation radius option)? Perhaps dodge: {anchor: "center", separation: 2}?
  • Would it be better to specify y: "dodge" or some such (i.e., tied to the channel), instead of dodge: true?
  • Avoid materializing the R array if r is a constant?
  • Generalize this to other marks? Does Plot need a concept of “layouts” (in pixel space), similar to transforms (data space)?

Fixes #164.

@mbostock mbostock requested a review from Fil January 4, 2022 03:54
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the feature, but I'd prefer if we could have a minimal code footprint in src/dot.js, since that's one of the first files that someone would read to understand how to write a mark. Also if we introduce layouts, we'll probably have more of them with time :)

I can see two options:

  1. Make it a plugin, rather than a core feature (as in https://observablehq.com/@fil/experimental-plot-beeswarm, or better—and that is a good opportunity to define what a plugin is)
  2. Move the specific code to a different file in src (layouts/dodge.js ?), but still call it from within dot.js

How to specify the Rsep (separation radius option)?

{dodge: "center", separation: 2} would be consistent with the "flat" options we have elsewhere (e.g. stack options), but see below ↓

Would it be better to specify y: "dodge" or some such (i.e., tied to the channel), instead of dodge: true?

maybe something like y: {layout: "dodge", separation: 2, …moreLayoutOptions} ?

@mbostock
Copy link
Member Author

mbostock commented Jan 4, 2022

I think this is useful enough that I want it included by default, rather than folks having to go look for a plugin and learn how to load it.

I don’t think we should optimize the readability of the implementation. Or at least, it’s not a priority relative to the exposed functionality of Plot which will be used by many more people. Of course I will endeavor to build a beautiful implementation but I do not expect Plot’s implementation to be “easy” to understand, or simple (see the bin transform, for example).

I do think there will be other layouts, and that we may want this functionality shared across marks, and thus it’s useful to think about both how the code should be structured within Plot, and whether we can generalize how it is exposed, too. The idea of binding a layout to a channel is interesting. Though, I think it may be limiting, as I expect there will be other cases where a layout may produce more than one channel. So, perhaps there’s a “layout” option, similar to a transform? Like layout: Plot.dodge(…)?

@jtrim-ons
Copy link

This looks good to me, and thanks for the mention.

i hope this isn't just shamelessly promoting my own idea, but I wonder if it might be worth considering a "compact beeswarm" option if the radii are all equal. Some examples of compact beeswarm vs dodge are in my initial post here, and an explanation of the method is here. For many datasets, I prefer it to the dodge function, just because it doesn't have a windblown look.

I don't think any of my compact beeswarm implementations have particularly good code style, but I could potentially contribute to an implementation if it would be useful. I also completely understand if it's not something you want to add to Plot at the moment.

src/marks/dot.js Outdated
// Find the best y-value where this circle can fit.
for (let y of intervals.flat().sort(compare)) {
if (intervals.every(([lo, hi]) => y <= lo || y >= hi)) {
Y[i] = y;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we’re setting Y[i] = 0 if there’s no conflict, but I wonder at least in the case of asymmetric swarms whether we should instead be setting it to R[i] + Rsep so that the bottom of the swarm is aligned (rather than the centers of the circles).

@mbostock
Copy link
Member Author

mbostock commented Jan 7, 2022

Thanks @jtrim-ons! I think this initial algorithm is “good enough” to start, but I’m definitely willing to swap it out with something that performs better in the future. I’d prefer if the algorithm supported varying radius, though Plot could chose a special algorithm for fixed radius dots if we wanted.

The main thing to figure out with this PR is how much we want to generalize the concept of layouts, or if this will be a bespoke feature for dots. (Or if it can start as bespoke and evolve into a more general approach in the future.)

@Fil
Copy link
Contributor

Fil commented Jan 14, 2022

Also related, Dorling cartograms (for Plot.Carto :-) )

@mbostock mbostock force-pushed the mbostock/dot-dodge branch 4 times, most recently from 6f8d9cb to 08f0e0b Compare January 25, 2022 21:39
@mbostock mbostock requested a review from Fil January 25, 2022 21:42
@mbostock
Copy link
Member Author

I’ve now implemented this as a layout independent of marks, as a counterproposal to #691. A layout takes the same arguments as a mark’s render function and returns transformed values rather, providing a hook for transforming channels before rendering. Layouts are invoked similarly to option transforms, and thus can access (and even transform!) mark options, or take separate layout-specific options as desired.

Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really fun and powerful to see how we can add new layouts (I've created a circlePacking layout in less than 20 lines of code).

The dodge layout looks good to me.

As a follow-up or as part of this PR we want to extend it to work with a channel r, even if the underlying mark has no use for it, as this would enable labelling bubble charts with a variable radius. (As discussed in #708).

I'd also prefer if the default r for a symbol dot was consistent across the dot mark and the dodge layout (4.5 instead of 3), but not at the cost of overcomplicating things — (also discussed in #708, with a possible solution of checking this.r).

I'll try and work on these two issues, which are not blocking.

Besides that, it needs documentation. A starting point is available in #708.

@Fil
Copy link
Contributor

Fil commented Jan 26, 2022

During review we mentioned a possible (structural) change which would be to pass all the facets at once instead of calling the function once for each facet—thus allowing a layout to worked "globally" or "locally" on faceted data.

@mbostock
Copy link
Member Author

Another layout idea: a strategy for avoiding occlusion with text labels, like @fil/occlusion. Though I’m not sure how the layout would determine the bounding box for the text labels. Would it use a heuristic like we do for line wrapping? Would it use an offscreen canvas and measureText?

@Fil Fil mentioned this pull request Jan 26, 2022
1 task
const index = filter(markIndex.get(mark), channels, values);
if (mark.layout != null) values = mark.layout(index, scales, values, dimensions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (mark.layout != null) values = mark.layout(index, scales, values, dimensions);
if (mark.layout != null) values = mark.layout(index.slice(), scales, values, dimensions);

should we make a defensive copy?

Alternatively, if we assume that it is legal for the layout to modify the index, write it as:

Suggested change
if (mark.layout != null) values = mark.layout(index, scales, values, dimensions);
if (mark.layout != null) ({index, values} = mark.layout(index, scales, values, dimensions));

(and same in the facet call).

I think my preference would go to explicitly allowing a layout to change the index, for example a hexbin layout could group points into hexagons, and a unit chart/isotype layout could create several symbols out of (say) a bar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make a defensive copy?

I feel like we’ve discussed this a bunch, so I want to reiterate what our conventions are.

  1. We favor immutability. We typically use “copy on write” (e.g., returning a new object or array) rather than mutating the value in-place. This is especially true regarding arguments to a function, where the caller typically would not expect the passed values to be mutated as a side effect. In cases where we violate this principle, we should call it out (e.g., axes.js).

  2. We aren’t defensive. If we invoke a user function, e.g. a custom reducer, we do not create a defensive copy to protect against mutation. Instead we assume that the user function will not modify the input. If user code violates this assumption, the behavior is undefined. “All bets are off.” We do this because creating a defensive copy adds significant overhead and would effectively penalize “well-behaved” users because of hypothetical bad behavior.

  3. We create copies when transferring ownership. (This could be considered an exception or nuance to the previous rule.) For functions that return values, as opposed to function arguments, we consider the returned values to be owned by the user. The user should be allowed to mutate the values if they desire. This means we need to return a copy. For example this is used in plot.scale when we return a domain.

Mutation is also acceptable if it’s not “visible externally” as a performance optimization, though this should still be used with caution.

@Fil
Copy link
Contributor

Fil commented Jan 27, 2022

Suggestions:

Besides these suggestions:

  • the question whether a layout is explicitly allowed or forbidden to modify the index (dodge (beeswarms) #648 (comment)); this is blocking if we expose the API (in the sense that it would mean a change to the API); but not blocking if we don't yet expose it. NO CHANGE
  • the need for Plot.text(dodgeY({r: channel})) — TBD, but probably not blocking

@mbostock
Copy link
Member Author

I think my preference would go to explicitly allowing a layout to change the index, for example a hexbin layout could group points into hexagons, and a unit chart/isotype layout could create several symbols out of (say) a bar.

I think we should try implementing these and see where that takes us.

For hexbin, I’m guessing that’s more of a mark than a layout (or transform)? Because if you’re producing hexagonal bins, you almost certainly want to render them as hexagons, meaning that the representation is dictated and it’s not simply transforming channel values. I’ve been thinking about density contours similarly, which perhaps we could also implement.

For isotype and waffle charts (is that what you meant by unit chart?), it feels like it might be more of a transform than a layout. Or maybe that’s a mark, too? Or a pattern? In any case it should be easier to answer these questions through prototyping. 😁

@Fil Fil mentioned this pull request Feb 23, 2022
@mbostock
Copy link
Member Author

mbostock commented Mar 1, 2022

Superseded by #775.

@mbostock mbostock closed this Mar 1, 2022
@mbostock mbostock mentioned this pull request Mar 11, 2022
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beeswarm plot?
3 participants