Return X/Xw arrays in linear regression output?

Hello again Raphael,

What do you think about adding the X and/or Xw array as hidden outputs in `linear_regression` along with the residuals that are currently returned? The use case is that following regression, I'd like to plot the residuals vs the regressors. Unfortunately, my input dataframe has many NaN values in each of the Y and [X,...] columns, and I'm using the `remove_na` option in the `linear_regression` function for convenience. As a result, though, I cannot easily associate the values in the returned `residuals_` array with the appropriate X values without reproducing everything that was done inside pingouin to prepare the data for regression, or by manually recalculating the residuals from the model coefficients.

If, however, the X array were returned as a hidden attribute `X_`, I could plot `fit.residuals_` vs `fit.X_[i]` for each regressor. As far as I can tell from the source, this could be done simply by adding X and/or Xw to the output dataframe or dict exactly as is done for the residuals without any side effects except potentially increased memory usage for particularly large regressions.

If this seems like a reasonable feature, I can implement it and submit a pull request. I think I would add a `returnx` argument to `linear_regression` (default False), and if True, I would add X and Xw to the output DataFrame as X_ and Xw_ or to the output dict as 'X' and 'Xw'. Let me know what you think about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Return X/Xw arrays in linear regression output? #112

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Return X/Xw arrays in linear regression output? #112

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions