-
Notifications
You must be signed in to change notification settings - Fork 185
Fil/reduce empty bins to null #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(tested with #490 : 👍) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we want to return the empty bins, then we should incorporate the binfilter optimization here, and possibly we optimize rendering such that zero-width or zero-height rects and bars are not instantiated in SVG. We should not change the semantics of the reducers that should return zero for empty bins.
src/transforms/group.js
Outdated
@@ -199,13 +199,14 @@ const reduceLast = { | |||
const reduceCount = { | |||
label: "Frequency", | |||
reduce(I) { | |||
return I.length; | |||
return I.length || NaN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong. The count is zero for an empty bin, not NaN. Same with distinct and proportion. I agree that other reducers (such as first) should return undefined for an empty bin, but they already do, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning NaN is analogous to ignoring the empty bins and has the same result, except for line and area where it's semantically better (an empty bin results in a cut of the line rather than an interpolation between full bins). But I'll check if the check on emptiness can be done before the reducer, because that indeed feels wrong.
Note: the only other reducer which returned 0 was sum. I believe there's a tiny difference between a sum of empty and a sum of 1 and -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the effect is (roughly) the same on the resulting chart, but it changes the semantics of the bin transform, which I feel is undesirable.
|
||
function nonempty2([, {size}]) { | ||
return size > 0; | ||
return [x0, x1, set.size ? I => I.filter(set.has, set) : () => []]; // TODO optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a nice optimization (if we want to return empty bins). 👍
creates channel values that will result in the corresponding mark to be filtered out, or skipped in case of a grouped mark
…lement is not passing the athletes-height-weight-bin-stroke.js needed a refinement if we wanted to ignore all the rects that we didn't want to show (instead of having invisible rects on the small non-empty bins)
(this also avoids looking up I[-1] (last) or I[0] with non-existent indices)
7126482
to
50de216
Compare
In terms of information, we want to be able to tell "there is no data" separately from "the result is zero", because in one case the result can be skipped by a line / ignored by a bar, whereas in the other it should be drawn even if it's a zero. Would you have the same gut feeling if the reducers returned null instead of NaN for "perhaps invalid bins"? The code could work with nulls, even though it would mean a special case in the stack transform. (What to do with nulls when stacking? If it's bars that we're stacking, we can ignore nulls, but if if it's areas we probably want the ribbon to get thinner at that point—each mark would have to call the default stack transform with the relevant option to "ignore nulls" or "zero nulls"). |
The stack transform can then filter out invalid bars, which should be an option since we'll want it if calling from Plot.barY but not from Plot.areaY
TODO: this filtering of invalid/empty spots in Plot.stack should be an option: we'll want it if calling from Plot.barY but not from Plot.areaY. |
I see. I think what I’m objecting to is that it feels “icky” to encode (overload) the empty bit into the value returned by the reducer, and instead there should be some out-of-band way to communicate which bins are empty to things that care. I’ll think about this some more and try to come up with an alternative. |
I’ve mulled on this and I’d like to proceed with #495. I do think it makes sense that the bin transform filters by default (like the group transform, which does not return empty groups). And I feel an explicit filter option is more straightforward than giving the reducers special behavior for the empty case, and then likewise the stack transform needing special behavior for null and NaN. Thank you for engaging in this exploration with me and let’s chat about this tomorrow! |
… by number of cylinders The bins are sorted by decreasing r, so that they are all visible. The example would benefit from stackR (#197). It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).
* This example plot computes the median of cars' economy (mpg), grouped by number of cylinders The bins are sorted by decreasing r, so that they are all visible. The example would benefit from stackR (#197). It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard). * Update test/plots/cars-mpg.js Co-authored-by: Mike Bostock <[email protected]> * Update test/plots/cars-mpg.js Co-authored-by: Mike Bostock <[email protected]> * zero, not filter * group, not bin * remove console.log * stroke, not fill Co-authored-by: Mike Bostock <[email protected]>
supersedes #489