Skip to content

The bin transform should detect integers #1013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mbostock opened this issue Jul 27, 2022 · 4 comments
Open

The bin transform should detect integers #1013

mbostock opened this issue Jul 27, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@mbostock
Copy link
Member

It’d be nice if the bin transform were smart enough to detect integers and make sure that the bin size is not fractional. For example consider the Chinook dataset where the MediaTypeId column is a number whose value is 1, 2, 3, 4, or 5:

untitled (50)

Plot.plot({
  marks: [
    Plot.rectY(tracks, Plot.binX({y: "count"}, {x: "MediaTypeId"})),
    Plot.ruleY([0])
  ]
})

If the bin transform detected integers automatically, you could get something like this instead:

untitled (51)

Plot.plot({
  x: {
    interval: 1
  },
  marks: [
    Plot.rectY(tracks, Plot.binX({y: "count"}, {x: "MediaTypeId", interval: 1})),
    Plot.ruleY([0])
  ]
})

Potentially this could also work with the interval-aware default tick format, too. #932

untitled (52)

Plot.plot({
  x: {
    interval: 1,
    tickFormat: ""
  },
  marks: [
    Plot.rectY(tracks, Plot.binX({y: "count"}, {x: "MediaTypeId", interval: 1})),
    Plot.ruleY([0])
  ]
})

Though, I suppose the group transform would be even better here…

untitled (53)

Plot.plot({
  marks: [
    Plot.barY(tracks, Plot.groupX({y: "count"}, {x: "MediaTypeId"})),
    Plot.ruleY([0])
  ]
})
@mbostock mbostock added the enhancement New feature or request label Jul 27, 2022
@Fil
Copy link
Contributor

Fil commented Jul 27, 2022

Related: #932 #355 #734

When switching to groups (which gives a better histogram in this case), there is a risk of not showing groups with no data, and the interval option is needed.

@mbostock
Copy link
Member Author

There’s probably a similar enhancement here with temporal data: e.g., if the values are all at UTC midnights, then we shouldn’t choose a bin threshold shorter than d3.utcDay. But testing for lots of time intervals (seconds, minutes, hours, days, weeks, months, years) might be slow… though maybe still fast enough to be worth doing.

@mbostock
Copy link
Member Author

mbostock commented Jun 1, 2023

Another example in the wild https://twitter.com/slothstats/status/1664091552627539968

image

@mbostock
Copy link
Member Author

This can be generalized to other intervals too. For example if you have daily data, you don’t want the bin transform using hourly bins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants