-
-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Hello again Raphael,
What do you think about adding the X and/or Xw array as hidden outputs in linear_regression
along with the residuals that are currently returned? The use case is that following regression, I'd like to plot the residuals vs the regressors. Unfortunately, my input dataframe has many NaN values in each of the Y and [X,...] columns, and I'm using the remove_na
option in the linear_regression
function for convenience. As a result, though, I cannot easily associate the values in the returned residuals_
array with the appropriate X values without reproducing everything that was done inside pingouin to prepare the data for regression, or by manually recalculating the residuals from the model coefficients.
If, however, the X array were returned as a hidden attribute X_
, I could plot fit.residuals_
vs fit.X_[i]
for each regressor. As far as I can tell from the source, this could be done simply by adding X and/or Xw to the output dataframe or dict exactly as is done for the residuals without any side effects except potentially increased memory usage for particularly large regressions.
If this seems like a reasonable feature, I can implement it and submit a pull request. I think I would add a returnx
argument to linear_regression
(default False), and if True, I would add X and Xw to the output DataFrame as X_ and Xw_ or to the output dict as 'X' and 'Xw'. Let me know what you think about this.