-
Notifications
You must be signed in to change notification settings - Fork 185
An option to include empty bins #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When using line or area on binned data, it can be necessary to set {skip: false} in order to reduce to count=0 values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
README.md
Outdated
@@ -1032,6 +1032,7 @@ To control how the quantitative dimensions *x* and *y* are divided into bins, th | |||
* **thresholds** - the threshold values; see below | |||
* **domain** - values outside the domain will be omitted | |||
* **cumulative** - if positive, each bin will contain all lesser bins | |||
* **skip** - skip empty bins (default: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the name empty, where true means to include empty bins, defaulting to false?
src/transforms/bin.js
Outdated
value = {...maybeValue(value)}; | ||
if (value.domain === undefined) value.domain = domain; | ||
if (value.cumulative === undefined) value.cumulative = cumulative; | ||
if (value.thresholds === undefined) value.thresholds = thresholds; | ||
if (value.value === undefined) value.value = defaultValue; | ||
value.skip = skip; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value.skip = skip; | |
value.skip = !!skip; |
Another thought is whether the bin transform should not filter by default, since we are trading off a performance optimization for a change in semantics. I think ideally we’d be able to tell whether the bins were consumed by a mark that requires continuous semantics? Anyway, this is a fine change to start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to the empty option as suggested. If this looks good to you, please merge?
One minor thing: I noticed that this code doesn’t allow you to set the empty option for an individual dimension, as we do for domain, cumulative, and thresholds. It would be possible to support this by adding the following:
if (value.empty === undefined) value.empty = empty;
value.empty = !!value.empty;
However, I don’t think it’s meaningful to support this on a per-dimension basis because the empty option applies to the constructed bins across dimensions. So even though the code separately looks at bx.empty
and by.empty
, these should always have the same value, and even if they have different values, the transform will only return non-empty values if both bx.empty
and by.empty
are truthy.
Another tiny observation is that b[0]
will be undefined when the bins are empty; or in other words, the z, fill, and stroke channels if any will be undefined for empty bins. But that is exactly what we want.
And lastly I wonder if the rect mark should be smart enough to not create rect elements if the computed width or height is nonpositive? That might make rendering faster even when rendering empty bins. But I don’t feel the need to implementation that optimization right now. 🙂
I want to add a test plot before I merge, but this works well in https://observablehq.com/@fil/missing-data-in-lines-489 |
082b1f2
to
33c54eb
Compare
OK now I want to walk this back, and replace it with a different strategy based on #490. |
Can you give me a hint as to what you’re thinking? I don’t see the obvious connection between this and #490. |
The idea is that all the default reducers should return null when given an empty array. Currently these reducers are not called for empty bins, but if they were called, sum, length and proportion would return 0. However if we change them to return return null, and if they are used to inform a "filtered channel" (like any of stroke, fill, fillOpacity etc, or y for barY etc):
"filtered channels" are consolidated in #490, which made me think there was a direct connection, but in fact it's orthogonal. |
… by number of cylinders The bins are sorted by decreasing r, so that they are all visible. The example would benefit from stackR (#197). It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).
* This example plot computes the median of cars' economy (mpg), grouped by number of cylinders The bins are sorted by decreasing r, so that they are all visible. The example would benefit from stackR (#197). It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard). * Update test/plots/cars-mpg.js Co-authored-by: Mike Bostock <[email protected]> * Update test/plots/cars-mpg.js Co-authored-by: Mike Bostock <[email protected]> * zero, not filter * group, not bin * remove console.log * stroke, not fill Co-authored-by: Mike Bostock <[email protected]>
When using line or area on binned data, it can be necessary to set {skip: false} in order to reduce to count=0 values.
(Not fully sure that this is the correct solution, or the correct naming.)
build https://observablehq.com/@fil/missing-data-in-lines-489
(let me know if you think the principle sounds correct—currently this is breaking quite a few tests, I'll fix that later.)