Skip to content

Conversation

@mkfreeman
Copy link
Contributor

@mkfreeman mkfreeman commented Dec 8, 2021

Added four more reducers that can be set as the basis for Plot.normalize:

  • min
  • max
  • deviation
  • variance

See this notebook for a demo.

@mbostock
Copy link
Member

mbostock commented Dec 8, 2021

If you use the “deviation” basis, it means that you’re taking a set X = {x1, x2, x3, …} and mapping them to X′ = {x1 / d, x2 / d, x3 / d, …} where d = deviation(X). Is that what you expect? That seems surprising to me. I think that I would expect something more like {(x1 - m) / d, (x2 - m) / d, …} where m = median(X).

Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max is a nice addition (make it fit!), and min is symmetric for negative values. We might also want to normalize by extent (box all the values between min and max). (EDITED: we already have extent)

I can also see the use-case for deviation, in particular with a fixed domain of, say, [-3, 3]. Agree with @mbostock, that it should be centered, but I'd argue for centering around the mean not the median, per the std dev definition.

Re: variance, unless there is a real world use-case that I don't know of, I don't think it's a good idea: the variance is not in the same unit as the original data, but in unit squared, so the normalized value of data in meters would result in values expressed in m^-1, which I think is very confusing.

@mbostock
Copy link
Member

mbostock commented Dec 8, 2021

I agree with Fil’s recommendations.

@mkfreeman
Copy link
Contributor Author

mkfreeman commented Dec 9, 2021

Thanks for the review! I made the following changes:

  • Removed the variance option
  • Defined deviation using the suggested formula above: X′ ={(x1 - m) / d, (x2 - m) / d, …} (where m is mean)

@mkfreeman
Copy link
Contributor Author

mkfreeman commented Dec 9, 2021

The min normalization creates a plot with the domain [Infinity, Infinity] if the minimum value is zero -- is this an expected (acceptable) behavior? Example in the notebook
Screen Shot 2021-12-08 at 8 43 28 PM

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good. 👍 Would you mind also updating the README, too?

const normalizeMedian = normalizeBasis((I, S) => median(I, i => S[i]));
const normalizeSum = normalizeBasis((I, S) => sum(I, i => S[i]));
const normalizeMin = normalizeBasis((I, S) => min(I, i => S[i]));
const normalizeSum = normalizeBasis((I, S) => sum(I, i => S[i]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please preserve the trailing newline at the end of this file. 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated (and preserved!). Let me know if you prefer a different phrasing or ordering (I left them as alphabetical, with extent and deviation at the end, as they follow a different pattern than those above)

README.md Outdated

The Plot.normalizeX and Plot.normalizeY transforms normalize series values relative to the given basis. For example, if the series values are [*y₀*, *y₁*, *y₂*, …] and the *first* basis is used, the mapped series values would be [*y₀* / *y₀*, *y₁* / *y₀*, *y₂* / *y₀*, …] as in an index chart. The **basis** option specifies how to normalize the series values. The following basis methods are supported:


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now there’s an extra newline. 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 🤦

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Fil Fil changed the title Add normalize reducers min, max, deviation, variance Add normalize reducers min, max, deviation Dec 9, 2021
@Fil Fil force-pushed the mkfreeman/normalize branch from 01ea14f to b3e4bf9 Compare December 9, 2021 07:20
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's normal to crash on min if the minimum is 0 (the max normalization will also give NaNs if the max is zero.)

For deviation, we can avoid a crash since in that case, all values are equal, and it makes sense to normalize the value to 0 (it will happen, for example, on a facet for which there is only one data point: better to show it that not).

I've added a few unit tests, and have decorated your commits with your actual email (instead of [email protected]).

In your terminal, you can do:

$ git config --global user.name "John Doe"
$ git config --global user.email [email protected]

@mkfreeman mkfreeman merged commit c8849c5 into main Dec 9, 2021
@mkfreeman mkfreeman deleted the mkfreeman/normalize branch December 9, 2021 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants