Skip to content

Smarter formatting for year number channels? #768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mbostock opened this issue Feb 19, 2022 · 9 comments
Open

Smarter formatting for year number channels? #768

mbostock opened this issue Feb 19, 2022 · 9 comments
Labels
enhancement New feature or request question Further information is needed

Comments

@mbostock
Copy link
Member

mbostock commented Feb 19, 2022

In a case like this (data), it’d be nice to avoid the commas for the year axis:

Screen Shot 2022-02-19 at 12 54 12 PM

Plot.plot({
  width,
  color: { legend: true },
  marks: [
    Plot.rectY(overview, { x: "Year", y: "Value", fill: "Type", interval: 1 })
  ]
})

Of course, you can do it with x: {tickFormat: ""}, but could Plot figure this out automatically?

Similarly when you do something like title: "Year", it’s a bummer that the automatic formatting for numbers shows a comma. I think we could maybe track a hint that looks for (case-insensitive) “year” and avoids the comma.

@mbostock mbostock added enhancement New feature or request question Further information is needed labels Feb 19, 2022
@Fil
Copy link
Contributor

Fil commented Feb 19, 2022

related #355

@Fil
Copy link
Contributor

Fil commented Feb 20, 2022

In fact this is one of the first questions that came up in a workshop last Friday. But it was happening in Inputs.table, we hadn't started plotting the data yet. The offending field was called "annee" (French for year).

@arky
Copy link

arky commented Mar 9, 2022

I think this is most common problem/gotcha for those starting with plot. This handling could reduce friction for new users who are not familiar with JavaScript data handling features.

I had posted exactly this question in forums:
https://talk.observablehq.com/t/handling-date-column-during-file-import/6333

https://observablehq.com/@arky/disasters-in-south-eastern-asia-1900-2021

@eagereyes
Copy link
Contributor

Here's a related paper. I wonder if there are implementations of something like this out there in open-source land, it's such a common issue.

@Fil
Copy link
Contributor

Fil commented Jun 8, 2022

Question: a possible strategy would be for the default format to depend on the domain. If the domain is contained in, say, [1500, 2200], set the default formatter to be d => '${d}' rather than Intl.NumberFormat. This heuristic might be slightly surprising in the odd case, but it would fix the very common issue of years being poorly formatted, while still giving nice numbers by default in general.

Fil added a commit that referenced this issue Jun 9, 2022
lists a few TODOs re: the default tick format:
- we don't want decimal notation if the interval is specified as an integer
- we don't want months to appear if the interval is specified as d3.utcYear
- we don't want years to appear with commas (#768)
Fil added a commit that referenced this issue Jun 9, 2022
lists a few TODOs re: the default tick format:
- we don't want decimal notation if the interval is specified as an integer
- we don't want months to appear if the interval is specified as d3.utcYear
- we don't want years to appear with commas (#768)
mbostock added a commit that referenced this issue Jun 10, 2022
* Specifying a scale interval shows the intent of having ordinal numerical or ordinal dates: suppress warning.

Side note: if a numeric interval was specified, string numerics have already been coerced to numbers by the scale transform; so this in fact only has consequences for ordinal dates, such as in the downloads-ordinal test plot.

* document scale intervals

* test plot with year intervals

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* d3.utcDay-like intervals do not parse string dates

* reusable interval option

* When the interval option is applied on a quantitative scale, generate the ticks with the interval; also set the tickFormat so that we don't show 1.0, 2.0, 3.0 if the interval is an integer.

* tests

* normalize intervals

lists a few TODOs re: the default tick format:
- we don't want decimal notation if the interval is specified as an integer
- we don't want months to appear if the interval is specified as d3.utcYear
- we don't want years to appear with commas (#768)

* formatDefault for ordinal scales

* Update README

* call maybeInterval sooner

* tabular-nums for interval’d ordinal axes

Co-authored-by: Mike Bostock <[email protected]>
mbostock added a commit that referenced this issue Jun 10, 2022
* Specifying a scale interval shows the intent of having ordinal numerical or ordinal dates: suppress warning.

Side note: if a numeric interval was specified, string numerics have already been coerced to numbers by the scale transform; so this in fact only has consequences for ordinal dates, such as in the downloads-ordinal test plot.

* document scale intervals

* test plot with year intervals

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* d3.utcDay-like intervals do not parse string dates

* reusable interval option

* When the interval option is applied on a quantitative scale, generate the ticks with the interval; also set the tickFormat so that we don't show 1.0, 2.0, 3.0 if the interval is an integer.

* tests

* normalize intervals

lists a few TODOs re: the default tick format:
- we don't want decimal notation if the interval is specified as an integer
- we don't want months to appear if the interval is specified as d3.utcYear
- we don't want years to appear with commas (#768)

* formatDefault for ordinal scales

* Update README

* call maybeInterval sooner

* tabular-nums for interval’d ordinal axes

Co-authored-by: Mike Bostock <[email protected]>
mbostock added a commit that referenced this issue Jun 10, 2022
* ordinal interval

* fix test
(913629f)

* date scale interval & warning (#852)

* Specifying a scale interval shows the intent of having ordinal numerical or ordinal dates: suppress warning.

Side note: if a numeric interval was specified, string numerics have already been coerced to numbers by the scale transform; so this in fact only has consequences for ordinal dates, such as in the downloads-ordinal test plot.

* document scale intervals

* test plot with year intervals

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* Update src/scales.js

Co-authored-by: Mike Bostock <[email protected]>

* d3.utcDay-like intervals do not parse string dates

* reusable interval option

* When the interval option is applied on a quantitative scale, generate the ticks with the interval; also set the tickFormat so that we don't show 1.0, 2.0, 3.0 if the interval is an integer.

* tests

* normalize intervals

lists a few TODOs re: the default tick format:
- we don't want decimal notation if the interval is specified as an integer
- we don't want months to appear if the interval is specified as d3.utcYear
- we don't want years to appear with commas (#768)

* formatDefault for ordinal scales

* Update README

* call maybeInterval sooner

* tabular-nums for interval’d ordinal axes

Co-authored-by: Mike Bostock <[email protected]>

* Update README

* options.interval is not normalized here

Co-authored-by: Philippe Rivière <[email protected]>
@mbostock
Copy link
Member Author

mbostock commented Dec 1, 2022

If the domain is contained in, say, [1500, 2200], set the default formatter to be d => '${d}' rather than Intl.NumberFormat.

We should also ensure that all channel values are integers (or at least sample the first ~40? channel values). If we see a fractional value such as 1500.1231041240 we should use the default number format rather than assuming it is a year.

Fil added a commit that referenced this issue Feb 10, 2023
TODO
* should trigger on the interval option being integer
* band-integer (e.g., yearly-request)
* find a nicer way to pass the information up
* test for the [-10, 10] cases—maybe more what about 10-30?
@mbostock
Copy link
Member Author

If the scale has an interval option that is a year (or a multiple of a year), it seems like we could at least special-case that to use the %Y format and drop the -01-01.

@mbostock
Copy link
Member Author

mbostock commented May 15, 2023

I tried just dropping -01-01 (and -01) from isoformat in #1556, but I think we need to be a little smarter and detect intervals, since otherwise with ordinal scales you are more likely to end up with inconsistent formatting of dates. (Admittedly this is already a problem with sub-daily intervals, such as hours that fall on midnight, but it does exacerbate the problem.)

For example below, we could check the domain of the x scale and choose the shortest format that applies to all of the dates in the domain (YYYY-MM-DD) rather than choosing the shortest format each value independently.

Screenshot 2023-05-15 at 10 06 17 AM

@mbostock mbostock changed the title Smarter formatting for year channels? Smarter formatting for year number channels? Aug 23, 2023
@mbostock
Copy link
Member Author

mbostock commented Aug 23, 2023

#1790 handles this for temporal data. The only challenge left here is that we have very little signal that these represent years rather than arbitrary numbers (which should have commas). The possible heuristics are:

  1. Look at the channel name, and see it’s “year”.
  2. Look at the values, and check if they are integers in the range 1900–2100 (exact range TBD).

(1) is English only, which isn’t great (and we’d need to do word matching for field names like “sales year”). (2) is brittle; it wouldn’t work well for historical data, and it’ll have some false positives for other data (e.g., melting points of metals in Fahrenheit). But it’s also not the end of the world if there’s a false positive, since the only difference would be a missing comma. And we don’t really need commas for four or fewer digits anyway. So we could extend the heuristic to integers in the range 0–9999.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is needed
Projects
None yet
Development

No branches or pull requests

4 participants