Skip to content

Return X/Xw arrays in linear regression output? #112

@kncrabtree

Description

@kncrabtree

Hello again Raphael,

What do you think about adding the X and/or Xw array as hidden outputs in linear_regression along with the residuals that are currently returned? The use case is that following regression, I'd like to plot the residuals vs the regressors. Unfortunately, my input dataframe has many NaN values in each of the Y and [X,...] columns, and I'm using the remove_na option in the linear_regression function for convenience. As a result, though, I cannot easily associate the values in the returned residuals_ array with the appropriate X values without reproducing everything that was done inside pingouin to prepare the data for regression, or by manually recalculating the residuals from the model coefficients.

If, however, the X array were returned as a hidden attribute X_, I could plot fit.residuals_ vs fit.X_[i] for each regressor. As far as I can tell from the source, this could be done simply by adding X and/or Xw to the output dataframe or dict exactly as is done for the residuals without any side effects except potentially increased memory usage for particularly large regressions.

If this seems like a reasonable feature, I can implement it and submit a pull request. I think I would add a returnx argument to linear_regression (default False), and if True, I would add X and Xw to the output DataFrame as X_ and Xw_ or to the output dict as 'X' and 'Xw'. Let me know what you think about this.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions