Skip to content

Fil/reduce empty bins to null #491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

Fil
Copy link
Contributor

@Fil Fil commented Aug 9, 2021

supersedes #489

@Fil
Copy link
Contributor Author

Fil commented Aug 9, 2021

(tested with #490 : 👍)

@Fil Fil requested a review from mbostock August 9, 2021 17:41
@Fil Fil changed the title Fil/reduce empty bins to null Fil/reduce empty bins to NaN Aug 9, 2021
Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we want to return the empty bins, then we should incorporate the binfilter optimization here, and possibly we optimize rendering such that zero-width or zero-height rects and bars are not instantiated in SVG. We should not change the semantics of the reducers that should return zero for empty bins.

@@ -199,13 +199,14 @@ const reduceLast = {
const reduceCount = {
label: "Frequency",
reduce(I) {
return I.length;
return I.length || NaN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels wrong. The count is zero for an empty bin, not NaN. Same with distinct and proportion. I agree that other reducers (such as first) should return undefined for an empty bin, but they already do, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning NaN is analogous to ignoring the empty bins and has the same result, except for line and area where it's semantically better (an empty bin results in a cut of the line rather than an interpolation between full bins). But I'll check if the check on emptiness can be done before the reducer, because that indeed feels wrong.

Note: the only other reducer which returned 0 was sum. I believe there's a tiny difference between a sum of empty and a sum of 1 and -1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the effect is (roughly) the same on the resulting chart, but it changes the semantics of the bin transform, which I feel is undesirable.


function nonempty2([, {size}]) {
return size > 0;
return [x0, x1, set.size ? I => I.filter(set.has, set) : () => []]; // TODO optimize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice optimization (if we want to return empty bins). 👍

Fil added 4 commits August 9, 2021 21:41
creates channel values that will result in the corresponding mark to be filtered out, or skipped in case of a grouped mark
…lement is not passing

the athletes-height-weight-bin-stroke.js needed a refinement if we wanted to ignore all the rects that we didn't want to show (instead of having invisible rects on the small non-empty bins)
(this also avoids looking up I[-1] (last) or I[0] with non-existent indices)
@Fil Fil force-pushed the fil/reduce-empty-bins-to-null branch from 7126482 to 50de216 Compare August 9, 2021 19:41
@Fil
Copy link
Contributor Author

Fil commented Aug 10, 2021

In terms of information, we want to be able to tell "there is no data" separately from "the result is zero", because in one case the result can be skipped by a line / ignored by a bar, whereas in the other it should be drawn even if it's a zero. Would you have the same gut feeling if the reducers returned null instead of NaN for "perhaps invalid bins"?

The code could work with nulls, even though it would mean a special case in the stack transform. (What to do with nulls when stacking? If it's bars that we're stacking, we can ignore nulls, but if if it's areas we probably want the ribbon to get thinner at that point—each mark would have to call the default stack transform with the relevant option to "ignore nulls" or "zero nulls").

The stack transform can then filter out invalid bars, which should be an option since we'll want it if calling from Plot.barY but not from Plot.areaY
@Fil Fil changed the title Fil/reduce empty bins to NaN Fil/reduce empty bins to null Aug 10, 2021
@Fil
Copy link
Contributor Author

Fil commented Aug 10, 2021

TODO: this filtering of invalid/empty spots in Plot.stack should be an option: we'll want it if calling from Plot.barY but not from Plot.areaY.

@mbostock
Copy link
Member

mbostock commented Aug 10, 2021

I see. I think what I’m objecting to is that it feels “icky” to encode (overload) the empty bit into the value returned by the reducer, and instead there should be some out-of-band way to communicate which bins are empty to things that care. I’ll think about this some more and try to come up with an alternative.

@mbostock
Copy link
Member

I’ve mulled on this and I’d like to proceed with #495. I do think it makes sense that the bin transform filters by default (like the group transform, which does not return empty groups). And I feel an explicit filter option is more straightforward than giving the reducers special behavior for the empty case, and then likewise the stack transform needing special behavior for null and NaN. Thank you for engaging in this exploration with me and let’s chat about this tomorrow!

@mbostock mbostock closed this Aug 11, 2021
Fil added a commit that referenced this pull request Aug 11, 2021
… by number of cylinders

The bins are sorted by decreasing r, so that they are all visible.

The example would benefit from stackR (#197).

It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).
@Fil Fil mentioned this pull request Aug 11, 2021
mbostock added a commit that referenced this pull request Aug 11, 2021
* This example plot computes the median of cars' economy (mpg), grouped by number of cylinders

The bins are sorted by decreasing r, so that they are all visible.

The example would benefit from stackR (#197).

It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).

* Update test/plots/cars-mpg.js

Co-authored-by: Mike Bostock <[email protected]>

* Update test/plots/cars-mpg.js

Co-authored-by: Mike Bostock <[email protected]>

* zero, not filter

* group, not bin

* remove console.log

* stroke, not fill

Co-authored-by: Mike Bostock <[email protected]>
@Fil Fil deleted the fil/reduce-empty-bins-to-null branch August 11, 2021 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants