Skip to content

Add linearRegression! #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add linearRegression! #105

wants to merge 1 commit into from

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Jan 23, 2021

Examples: https://observablehq.com/@data-workflows/plot-linear-regression

I also folded offsetRange into the range helper method. The linearRegression mark is a little interesting in that I lazily populate the channel values during the transform to avoid materializing extra copies.

@mbostock mbostock requested a review from Fil January 23, 2021 00:01
@mbostock mbostock force-pushed the mbostock/linear-regression branch from 08f6910 to ee9aed7 Compare January 23, 2021 00:05
@mbostock
Copy link
Member Author

To be slightly more forward-looking, we could use Plot.regression as the name, and then default to type = "linear". Then we could support other types of regressions in the future. (Note that these other types of regressions would likely need to use Plot.line or a custom mark type instead of building on top of Plot.link.)

@Fil
Copy link
Contributor

Fil commented Jan 23, 2021

Agree with "regression" as a name (or even "trend" or "trendline"), and switching to a line mark rather than a link would make sense—it's more a line than a link.

I love the way it works with z (or in the example, stroke=species), as this is something I'm still finding a bit difficult to do in my experiments (see e.g. this comment).

While I don't dispute its usefulness in many cases, I think there's a risk it raises a lot of questions, in particular if we want to make this a first-class element of Plot.

First remark is that a linear regression is primarily a statistical analysis and modelling technique, and it's unfortunate if we can't get the model back from plot. In particular, we would like to get the significance of the correlation back, not only the trend. Currently if the correlation is too weak to be meaningful, Plot will happily display a line, which might be the wrong thing to do. We'd also want to get the "predict" function back, in most cases.

Second remark: there's more than one way to create trendlines (we might want to base it only the most recent data, or do an exponential fit, etc), so I'd imagine that an open system where linear regression could be plugged-in would be more versatile than a specific mark.

The third question I have is when we have different scale types, to define what we're correlating: reading the code I'm under the impression that we're doing a correlation of the "raw" values, but if I were to do a log-log plot I'd probably want the analysis to be done on the logarithm of the values. This can be done by passing a transform before the linear regression, but then the scales with type:"log" wouldn't apply.

@mbostock
Copy link
Member Author

Good feedback. I agree that showing the significance of the correlation is important for understanding the analysis. And decoupling visualization from data processing, like we do with transforms, is more conducive to inspection and extensibility. So I’d like to think more about how we’d expose regression functions. Perhaps we should just integrate with d3-regression as you have already prototyped. It’s pretty awesome that already you can just use a d3-regression function as a transform and it “just works”…

@Fil
Copy link
Contributor

Fil commented Jan 23, 2021

Yes I think having d3-regression as an example of a transform that's easy to plug-in makes more sense than having it "built-in". I can imagine a Fourier example also, following a similar same pattern, and with configurable low-pass filters.

@mbostock mbostock closed this Jan 25, 2021
@mbostock mbostock mentioned this pull request Jan 28, 2021
@mbostock mbostock mentioned this pull request Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants