-
Notifications
You must be signed in to change notification settings - Fork 185
dodge (beeswarms) #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dodge (beeswarms) #648
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the feature, but I'd prefer if we could have a minimal code footprint in src/dot.js, since that's one of the first files that someone would read to understand how to write a mark. Also if we introduce layouts, we'll probably have more of them with time :)
I can see two options:
- Make it a plugin, rather than a core feature (as in https://observablehq.com/@fil/experimental-plot-beeswarm, or better—and that is a good opportunity to define what a plugin is)
- Move the specific code to a different file in src (layouts/dodge.js ?), but still call it from within dot.js
How to specify the Rsep (separation radius option)?
{dodge: "center", separation: 2} would be consistent with the "flat" options we have elsewhere (e.g. stack options), but see below ↓
Would it be better to specify y: "dodge" or some such (i.e., tied to the channel), instead of dodge: true?
maybe something like y: {layout: "dodge", separation: 2, …moreLayoutOptions} ?
I think this is useful enough that I want it included by default, rather than folks having to go look for a plugin and learn how to load it. I don’t think we should optimize the readability of the implementation. Or at least, it’s not a priority relative to the exposed functionality of Plot which will be used by many more people. Of course I will endeavor to build a beautiful implementation but I do not expect Plot’s implementation to be “easy” to understand, or simple (see the bin transform, for example). I do think there will be other layouts, and that we may want this functionality shared across marks, and thus it’s useful to think about both how the code should be structured within Plot, and whether we can generalize how it is exposed, too. The idea of binding a layout to a channel is interesting. Though, I think it may be limiting, as I expect there will be other cases where a layout may produce more than one channel. So, perhaps there’s a “layout” option, similar to a transform? Like |
This looks good to me, and thanks for the mention. i hope this isn't just shamelessly promoting my own idea, but I wonder if it might be worth considering a "compact beeswarm" option if the radii are all equal. Some examples of compact beeswarm vs dodge are in my initial post here, and an explanation of the method is here. For many datasets, I prefer it to the dodge function, just because it doesn't have a windblown look. I don't think any of my compact beeswarm implementations have particularly good code style, but I could potentially contribute to an implementation if it would be useful. I also completely understand if it's not something you want to add to Plot at the moment. |
src/marks/dot.js
Outdated
// Find the best y-value where this circle can fit. | ||
for (let y of intervals.flat().sort(compare)) { | ||
if (intervals.every(([lo, hi]) => y <= lo || y >= hi)) { | ||
Y[i] = y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we’re setting Y[i] = 0 if there’s no conflict, but I wonder at least in the case of asymmetric swarms whether we should instead be setting it to R[i] + Rsep so that the bottom of the swarm is aligned (rather than the centers of the circles).
Thanks @jtrim-ons! I think this initial algorithm is “good enough” to start, but I’m definitely willing to swap it out with something that performs better in the future. I’d prefer if the algorithm supported varying radius, though Plot could chose a special algorithm for fixed radius dots if we wanted. The main thing to figure out with this PR is how much we want to generalize the concept of layouts, or if this will be a bespoke feature for dots. (Or if it can start as bespoke and evolve into a more general approach in the future.) |
Also related, Dorling cartograms (for Plot.Carto :-) ) |
6f8d9cb
to
08f0e0b
Compare
I’ve now implemented this as a layout independent of marks, as a counterproposal to #691. A layout takes the same arguments as a mark’s render function and returns transformed values rather, providing a hook for transforming channels before rendering. Layouts are invoked similarly to option transforms, and thus can access (and even transform!) mark options, or take separate layout-specific options as desired. |
b400163
to
7b9d931
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really fun and powerful to see how we can add new layouts (I've created a circlePacking layout in less than 20 lines of code).
The dodge layout looks good to me.
As a follow-up or as part of this PR we want to extend it to work with a channel r, even if the underlying mark has no use for it, as this would enable labelling bubble charts with a variable radius. (As discussed in #708).
I'd also prefer if the default r for a symbol dot was consistent across the dot mark and the dodge layout (4.5 instead of 3), but not at the cost of overcomplicating things — (also discussed in #708, with a possible solution of checking this.r).
I'll try and work on these two issues, which are not blocking.
Besides that, it needs documentation. A starting point is available in #708.
During review we mentioned a possible (structural) change which would be to pass all the facets at once instead of calling the function once for each facet—thus allowing a layout to worked "globally" or "locally" on faceted data. |
Another layout idea: a strategy for avoiding occlusion with text labels, like @fil/occlusion. Though I’m not sure how the layout would determine the bounding box for the text labels. Would it use a heuristic like we do for line wrapping? Would it use an offscreen canvas and measureText? |
const index = filter(markIndex.get(mark), channels, values); | ||
if (mark.layout != null) values = mark.layout(index, scales, values, dimensions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (mark.layout != null) values = mark.layout(index, scales, values, dimensions); | |
if (mark.layout != null) values = mark.layout(index.slice(), scales, values, dimensions); |
should we make a defensive copy?
Alternatively, if we assume that it is legal for the layout to modify the index, write it as:
if (mark.layout != null) values = mark.layout(index, scales, values, dimensions); | |
if (mark.layout != null) ({index, values} = mark.layout(index, scales, values, dimensions)); |
(and same in the facet call).
I think my preference would go to explicitly allowing a layout to change the index, for example a hexbin layout could group points into hexagons, and a unit chart/isotype layout could create several symbols out of (say) a bar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make a defensive copy?
I feel like we’ve discussed this a bunch, so I want to reiterate what our conventions are.
-
We favor immutability. We typically use “copy on write” (e.g., returning a new object or array) rather than mutating the value in-place. This is especially true regarding arguments to a function, where the caller typically would not expect the passed values to be mutated as a side effect. In cases where we violate this principle, we should call it out (e.g., axes.js).
-
We aren’t defensive. If we invoke a user function, e.g. a custom reducer, we do not create a defensive copy to protect against mutation. Instead we assume that the user function will not modify the input. If user code violates this assumption, the behavior is undefined. “All bets are off.” We do this because creating a defensive copy adds significant overhead and would effectively penalize “well-behaved” users because of hypothetical bad behavior.
-
We create copies when transferring ownership. (This could be considered an exception or nuance to the previous rule.) For functions that return values, as opposed to function arguments, we consider the returned values to be owned by the user. The user should be allowed to mutate the values if they desire. This means we need to return a copy. For example this is used in plot.scale when we return a domain.
Mutation is also acceptable if it’s not “visible externally” as a performance optimization, though this should still be used with caution.
Suggestions:
Besides these suggestions:
|
I think we should try implementing these and see where that takes us. For hexbin, I’m guessing that’s more of a mark than a layout (or transform)? Because if you’re producing hexagonal bins, you almost certainly want to render them as hexagons, meaning that the representation is dictated and it’s not simply transforming channel values. I’ve been thinking about density contours similarly, which perhaps we could also implement. For isotype and waffle charts (is that what you meant by unit chart?), it feels like it might be more of a transform than a layout. Or maybe that’s a mark, too? Or a pattern? In any case it should be easier to answer these questions through prototyping. 😁 |
Superseded by #775. |
Inspired by @yurivish’s Building a Better Beeswarm and @jtrim-ons’s non-force-directed beeswarms, but using interval-tree-1d for simpler (and hopefully faster?) intersection testing.
Demo: https://observablehq.com/d/2055ab012188f381
Still to figure out:
dodge: {anchor: "center", separation: 2}
?y: "dodge"
or some such (i.e., tied to the channel), instead ofdodge: true
?r
is a constant?Fixes #164.