Skip to content

An option to include empty bins #489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

An option to include empty bins #489

wants to merge 4 commits into from

Conversation

Fil
Copy link
Contributor

@Fil Fil commented Aug 7, 2021

When using line or area on binned data, it can be necessary to set {skip: false} in order to reduce to count=0 values.

(Not fully sure that this is the correct solution, or the correct naming.)

build https://observablehq.com/@fil/missing-data-in-lines-489

(let me know if you think the principle sounds correct—currently this is breaking quite a few tests, I'll fix that later.)

When using line or area on binned data, it can be necessary to set {skip: false} in order to reduce to count=0 values.
@Fil Fil requested review from zanarmstrong and mbostock August 7, 2021 20:30
Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

README.md Outdated
@@ -1032,6 +1032,7 @@ To control how the quantitative dimensions *x* and *y* are divided into bins, th
* **thresholds** - the threshold values; see below
* **domain** - values outside the domain will be omitted
* **cumulative** - if positive, each bin will contain all lesser bins
* **skip** - skip empty bins (default: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the name empty, where true means to include empty bins, defaulting to false?

value = {...maybeValue(value)};
if (value.domain === undefined) value.domain = domain;
if (value.cumulative === undefined) value.cumulative = cumulative;
if (value.thresholds === undefined) value.thresholds = thresholds;
if (value.value === undefined) value.value = defaultValue;
value.skip = skip;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value.skip = skip;
value.skip = !!skip;

@Fil Fil changed the title An option to skip empty bins An option to include empty bins Aug 7, 2021
@mbostock
Copy link
Member

mbostock commented Aug 7, 2021

Another thought is whether the bin transform should not filter by default, since we are trading off a performance optimization for a change in semantics. I think ideally we’d be able to tell whether the bins were consumed by a mark that requires continuous semantics? Anyway, this is a fine change to start.

@mbostock mbostock marked this pull request as ready for review August 7, 2021 23:23
Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to the empty option as suggested. If this looks good to you, please merge?

One minor thing: I noticed that this code doesn’t allow you to set the empty option for an individual dimension, as we do for domain, cumulative, and thresholds. It would be possible to support this by adding the following:

if (value.empty === undefined) value.empty = empty;
value.empty = !!value.empty;

However, I don’t think it’s meaningful to support this on a per-dimension basis because the empty option applies to the constructed bins across dimensions. So even though the code separately looks at bx.empty and by.empty, these should always have the same value, and even if they have different values, the transform will only return non-empty values if both bx.empty and by.empty are truthy.

Another tiny observation is that b[0] will be undefined when the bins are empty; or in other words, the z, fill, and stroke channels if any will be undefined for empty bins. But that is exactly what we want.

And lastly I wonder if the rect mark should be smart enough to not create rect elements if the computed width or height is nonpositive? That might make rendering faster even when rendering empty bins. But I don’t feel the need to implementation that optimization right now. 🙂

@Fil
Copy link
Contributor Author

Fil commented Aug 8, 2021

I want to add a test plot before I merge, but this works well in https://observablehq.com/@fil/missing-data-in-lines-489

@Fil Fil force-pushed the fil/bin-explicit-skip branch from 082b1f2 to 33c54eb Compare August 9, 2021 10:41
@Fil
Copy link
Contributor Author

Fil commented Aug 9, 2021

OK now I want to walk this back, and replace it with a different strategy based on #490.

@mbostock
Copy link
Member

mbostock commented Aug 9, 2021

Can you give me a hint as to what you’re thinking? I don’t see the obvious connection between this and #490.

@Fil
Copy link
Contributor Author

Fil commented Aug 9, 2021

The idea is that all the default reducers should return null when given an empty array.

Currently these reducers are not called for empty bins, but if they were called, sum, length and proportion would return 0.

However if we change them to return return null, and if they are used to inform a "filtered channel" (like any of stroke, fill, fillOpacity etc, or y for barY etc):

  • a simple mark (say, a bar) is filtered out
  • a grouped mark (line) knows how to stop (instead of interpolating)
  • we don't need the empty: true option—it's still possible to impute missing values by passing a custom reducer (e.g. (bin) => bin.length) if you want to draw an empty bin with a value of 0.

"filtered channels" are consolidated in #490, which made me think there was a direct connection, but in fact it's orthogonal.

@Fil Fil closed this Aug 9, 2021
@Fil Fil mentioned this pull request Aug 9, 2021
Fil added a commit that referenced this pull request Aug 11, 2021
… by number of cylinders

The bins are sorted by decreasing r, so that they are all visible.

The example would benefit from stackR (#197).

It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).
@Fil Fil mentioned this pull request Aug 11, 2021
mbostock added a commit that referenced this pull request Aug 11, 2021
* This example plot computes the median of cars' economy (mpg), grouped by number of cylinders

The bins are sorted by decreasing r, so that they are all visible.

The example would benefit from stackR (#197).

It could also benefit from a strategy to create missing values for the line, so that it's broken when there are no data. However, it won't work with an approach such as "return empty bins" (#495), because returning empty bins will not create the *z* values for each and every category, which would be necessary if we wanted to create broken lines. This shows that a generic foolproof solution to #351 will require much more than #495 (and #489 and #491 are not better in that regard).

* Update test/plots/cars-mpg.js

Co-authored-by: Mike Bostock <[email protected]>

* Update test/plots/cars-mpg.js

Co-authored-by: Mike Bostock <[email protected]>

* zero, not filter

* group, not bin

* remove console.log

* stroke, not fill

Co-authored-by: Mike Bostock <[email protected]>
@Fil Fil deleted the fil/bin-explicit-skip branch January 24, 2023 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants