Skip to content

dcrt0(1/4): API2: add comments and docstring of the functions #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 107 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
583e684
Add comments to _lambda_alpha_max
lionelkusch Jan 3, 2025
2a769b9
Remove loss parameter
lionelkusch Jan 3, 2025
b8e70b0
Add full parameters for the frd threshold
lionelkusch Jan 3, 2025
9cbcd93
Add comments
lionelkusch Jan 3, 2025
337df18
Remove uppercase letter
lionelkusch Jan 8, 2025
cea2b2e
Add bibliography
lionelkusch Jan 8, 2025
b67c0de
Add reference
lionelkusch Jan 8, 2025
8aa6707
Remove sigma_zero
lionelkusch Jan 8, 2025
b06a946
Fix some typo and function
lionelkusch Jan 8, 2025
5545804
Fix bugs in tests
lionelkusch Jan 8, 2025
3d53a64
Fix format
lionelkusch Jan 8, 2025
6071d18
add link to original implementation
lionelkusch Jan 8, 2025
a7b6af5
At todo on difference of implementation
lionelkusch Jan 8, 2025
1b4e3c5
improve dcrt
lionelkusch Jan 9, 2025
eb33ac4
mprove warning message
lionelkusch Jan 9, 2025
a3e83da
Add warning
lionelkusch Jan 9, 2025
a680fbf
Format document
lionelkusch Jan 9, 2025
e9c5a2b
Add new assert
lionelkusch Jan 9, 2025
9fa0dca
Add new function in the init
lionelkusch Jan 10, 2025
92d052e
Include the new funvtion in the documentation
lionelkusch Jan 10, 2025
3729fd5
Improve comments
lionelkusch Jan 13, 2025
768034d
Format file
lionelkusch Jan 17, 2025
3153e68
Merge branch 'main' into PR_dcrt_comment
lionelkusch Jan 17, 2025
456b47b
formating files
lionelkusch Jan 17, 2025
db70f26
Update a comment
lionelkusch Jan 17, 2025
95bca86
Improve coverage
lionelkusch Jan 17, 2025
a813248
Format file
lionelkusch Jan 17, 2025
594d281
Fix bug in examples
lionelkusch Jan 20, 2025
70d6f5d
Split tests
lionelkusch Jan 20, 2025
0248642
Format file
lionelkusch Jan 20, 2025
c0ac779
Modify a docstring
lionelkusch Jan 20, 2025
ee49aca
Fix example
lionelkusch Jan 20, 2025
e984d42
Formating
lionelkusch Jan 20, 2025
820baaf
Merge branch 'main' into PR_dcrt_comment
lionelkusch Mar 10, 2025
ddfad20
Remove unecessary test
lionelkusch Mar 10, 2025
c5ae1ab
Remove option
lionelkusch Mar 13, 2025
2c92f61
Format
lionelkusch Mar 13, 2025
8e88847
Remove random generator in LassoCV
lionelkusch Mar 17, 2025
e606327
Change usage of alpha
lionelkusch Mar 26, 2025
72e2600
Merge branch 'main' into PR_dcrt_comment
lionelkusch Mar 26, 2025
bd5af17
format file
lionelkusch Mar 26, 2025
638fa8d
Add a comment
lionelkusch Mar 26, 2025
426c154
Add space for testing CI
lionelkusch Mar 26, 2025
cdea6a3
fix format
lionelkusch Mar 26, 2025
8970a68
improve example
lionelkusch Mar 27, 2025
a203221
Include modification
lionelkusch Mar 27, 2025
64e672b
rename some variable
lionelkusch Mar 27, 2025
892f8ef
format document
lionelkusch Mar 27, 2025
cb92ff0
change way to pass argument to lasso
lionelkusch Apr 2, 2025
1b06f70
add test for testing dictionnary
lionelkusch Apr 2, 2025
cd0147c
Format
lionelkusch Apr 2, 2025
24f0c9f
change name of parameters
lionelkusch Apr 2, 2025
2751f9f
modify randomforest to random_forest
lionelkusch Apr 3, 2025
b0f847c
Update docstring
lionelkusch Apr 3, 2025
7377108
Update the description of the tests
lionelkusch Apr 3, 2025
60a21af
change naem selected_variable
lionelkusch Apr 4, 2025
0b7c76f
rename ntree to n_tree
lionelkusch Apr 4, 2025
5307497
Remane sigma2
lionelkusch Apr 4, 2025
919457b
add a function for creating docstring from classes
lionelkusch Apr 7, 2025
1f666fc
pass the function to the class
lionelkusch Apr 7, 2025
9a5b08e
Format file
lionelkusch Apr 7, 2025
e0a9f60
update comments
lionelkusch Apr 7, 2025
38640ac
Merge branch 'main' into PR_dcrt_comment
lionelkusch Apr 7, 2025
cc28798
format
lionelkusch Apr 7, 2025
b4aeed7
Fix bug
lionelkusch Apr 7, 2025
d462e25
add test for dcrt function
lionelkusch Apr 7, 2025
d9e42c0
fit return self
lionelkusch Apr 8, 2025
71c5d37
Apply suggestions from code review
lionelkusch Apr 8, 2025
ab53ca2
change name in plot for the instance of the object
lionelkusch Apr 8, 2025
473d1f2
format the file
lionelkusch Apr 8, 2025
e30fbfe
reduce the size of the docstring for tests
lionelkusch Apr 8, 2025
c6021fc
add check_fit method
lionelkusch Apr 8, 2025
9a6cc83
format file
lionelkusch Apr 8, 2025
a10668f
add doctring for check_fit
lionelkusch Apr 8, 2025
b00954a
add a test for check_fit
lionelkusch Apr 8, 2025
7d78089
format file
lionelkusch Apr 8, 2025
3e09b78
Apply suggestions from code review
lionelkusch Apr 8, 2025
b2718ed
small modification
lionelkusch Apr 8, 2025
70446ac
minor modification on formating
lionelkusch Apr 9, 2025
8846ec5
add size of attribute
lionelkusch Apr 9, 2025
b4b53b2
iorder alphabetic the attribut
lionelkusch Apr 9, 2025
3291da3
add default value in docstring
lionelkusch Apr 9, 2025
ccc51d2
change default valur of percentile
lionelkusch Apr 9, 2025
7898115
remove fdr aggregation
lionelkusch Apr 10, 2025
a94312a
update docstring
lionelkusch Apr 10, 2025
3868c38
Merge branch 'main' into PR_dcrt_comment
lionelkusch Apr 10, 2025
99f2d7e
remove sreening for non lasso
lionelkusch Apr 10, 2025
a9b4b0f
Update src/hidimstat/dcrt.py
lionelkusch Apr 10, 2025
1a1ac5f
Merge branch 'PR_dcrt_comment' of https://github.com/lionelkusch/hidi…
lionelkusch Apr 10, 2025
a88c078
format file
lionelkusch Apr 10, 2025
7aa8def
Apply suggestions from code review
lionelkusch Apr 11, 2025
36a4d57
Move aggregation to _utils file
lionelkusch Apr 11, 2025
b435399
updtae for frp
lionelkusch Apr 11, 2025
8d9a946
fix bug
lionelkusch Apr 11, 2025
bf23937
Merge branch 'main' into PR_dcrt_comment
lionelkusch Apr 15, 2025
91b4eb3
move the function for docstrig in utils
lionelkusch Apr 15, 2025
181ed7e
Fix import
lionelkusch Apr 15, 2025
05ee677
Merge branch 'main' into PR_dcrt_comment
lionelkusch Apr 15, 2025
34e2728
Merge branch 'main' into PR_dcrt_comment
lionelkusch Apr 25, 2025
85f3033
update example
lionelkusch Apr 25, 2025
82b7ef7
update api
lionelkusch Apr 25, 2025
853b53e
Merge branch 'PR_dcrt_comment' of https://github.com/lionelkusch/hidi…
lionelkusch Apr 25, 2025
309d982
fix bug
lionelkusch Apr 25, 2025
0284cbb
fix condition
lionelkusch Apr 28, 2025
2b9cdbf
fix bug
lionelkusch Apr 28, 2025
7a24bda
fix tests
lionelkusch Apr 28, 2025
447d2a3
Merge branch 'main' into PR_dcrt_comment
lionelkusch Jun 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions doc_conf/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ Functions
quantile_aggregation
clustered_inference
clustered_inference_pvalue
dcrt_zero
dcrt_pvalue
desparsified_lasso
desparsified_lasso_pvalue
desparsified_group_lasso_pvalue
Expand All @@ -40,3 +38,4 @@ Classes
LOCO
CPI
PFI
D0CRT
19 changes: 8 additions & 11 deletions examples/plot_dcrt_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@

import matplotlib.pyplot as plt
import numpy as np
from hidimstat.dcrt import dcrt_zero, dcrt_pvalue

from hidimstat.dcrt import D0CRT
from hidimstat._utils.scenario import multivariate_1D_simulation

plt.rcParams.update({"font.size": 21})
Expand Down Expand Up @@ -51,20 +52,16 @@
y = np.maximum(0.0, y)

## dcrt Lasso ##
selection_features, X_res, sigma2, y_res = dcrt_zero(X, y, screening=False)
variables_important_lasso, pvals_lasso, ts_lasso = dcrt_pvalue(
selection_features, X_res, sigma2, y_res
)
d0crt_lasso = D0CRT(screening=False, statistic="residual")
d0crt_lasso.fit(X, y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d0crt_lasso.fit(X, y)
d0crt_lasso = D0CRT(screening=False, statistic="residual").fit(X, y)

variables_important_lasso, pvals_lasso = d0crt_lasso.importance()
typeI_error["Lasso"].append(sum(pvals_lasso[n_signal:] < alpha) / (p - n_signal))
power["Lasso"].append(sum(pvals_lasso[:n_signal] < alpha) / (n_signal))

## dcrt Random Forest ##
selection_features, X_res, sigma2, y_res = dcrt_zero(
X, y, screening=False, statistic="random_forest"
)
rvariables_important_forest, pvals_forest, ts_forest = dcrt_pvalue(
selection_features, X_res, sigma2, y_res
)
d0crt_random_forest = D0CRT(screening=False, statistic="random_forest")
d0crt_random_forest.fit(X, y)
variables_important_forest, pvals_forest = d0crt_random_forest.importance()
typeI_error["Forest"].append(sum(pvals_forest[n_signal:] < alpha) / (p - n_signal))
power["Forest"].append(sum(pvals_forest[:n_signal] < alpha) / (n_signal))

Expand Down
16 changes: 4 additions & 12 deletions examples/plot_model_agnostic_importance.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
from sklearn.model_selection import KFold
from sklearn.svm import SVC

from hidimstat import LOCO, dcrt_pvalue, dcrt_zero
from hidimstat import LOCO, D0CRT

#############################################################################
# Generate data where classes are not linearly separable
Expand Down Expand Up @@ -65,17 +65,9 @@
# test (:math:`H_0: X_j \perp\!\!\!\perp y | X_{-j}`) for each variable. However,
# this test is based on a linear model (LogisticRegression) and fails to reject the null
# in the presence of non-linear relationships.
selection_features, X_residual, sigma2, y_res = dcrt_zero(
X, y, problem_type="classification", screening=False
)
_, pval_dcrt, _ = dcrt_pvalue(
selection_features=selection_features,
X_res=X_residual,
y_res=y_res,
sigma2=sigma2,
fdr=0.05,
)

d0crt = D0CRT(problem_type="classification", screening=False)
d0crt.fit(X, y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d0crt.fit(X, y)
d0crt = D0CRT(problem_type="classification", screening=False).fit(X, y)

_, pval_dcrt = d0crt.importance(fpr=0.05)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is already mentioned in another PR, but importance should not take and fpr argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was mentioned in issue #217.
However, there is no consensus if this function has a X_test and Y_test or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all importance methods should have should have an X=None, Y=None, as arguments, that would default to the already provided training data. In some cases, these arguments would not even be taken into account.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that all the importance methods will have an X and y is not yet defined.
Using an X_test and y_test will change a bit the algorithm for computing the residual on this test data.

As a user, it's more readable to add additional optional parameters to a function than to have parameters which don't have any effect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is about API uniformity: you want to be able to loop over methods with maximally similar arguments without triggering an error. The mechanism is fundamental in sklearn and related libraries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate to me one function in sklearn where some parameters are not used by the function due to the homogenisation of the API?


################################################################################
# Compute p-values using LOCO
Expand Down
7 changes: 4 additions & 3 deletions src/hidimstat/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
desparsified_lasso_pvalue,
desparsified_group_lasso_pvalue,
)
from .dcrt import d0crt, D0CRT
from .conditional_permutation_importance import CPI
from .knockoffs import (
model_x_knockoff,
Expand All @@ -22,8 +23,8 @@
from .leave_one_covariate_out import LOCO
from .noise_std import reid
from .permutation_feature_importance import PFI

from .statistical_tools.aggregation import quantile_aggregation
from .dcrt import dcrt_zero, dcrt_pvalue

try:
from ._version import __version__
Expand All @@ -36,8 +37,8 @@
"clustered_inference_pvalue",
"ensemble_clustered_inference",
"ensemble_clustered_inference_pvalue",
"dcrt_zero",
"dcrt_pvalue",
"d0crt",
"D0CRT",
"desparsified_lasso",
"desparsified_lasso_pvalue",
"desparsified_group_lasso_pvalue",
Expand Down
115 changes: 115 additions & 0 deletions src/hidimstat/_utils/docstring.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
from copy import deepcopy


def _detection_section(lines):
"""
Detect sections in a numpy-style docstring by identifying section headers and their underlines.

Parameters
----------
lines : list of str
Lines of the docstring to parse.

Returns
-------
list of list of str
List of sections, where each section is a list of lines belonging to that section.
The first section is the summary, followed by other sections like Parameters, Returns, etc.
"""
sections = []
index_line = 1
begin_section = index_line
while len(lines) > index_line:
if "-------" in lines[index_line]:
sections.append(lines[begin_section : index_line - 2])
begin_section = index_line - 1
index_line += 1
sections.append(lines[begin_section : len(lines)])
return sections


def _parse_docstring(docstring):
"""
Parse a numpy-style docstring into its component sections.

Parameters
----------
docstring : str
The docstring to parse, following numpy docstring format.

Returns
-------
dict
Dictionary containing docstring sections with keys like 'short' (summary),
'Parameters', 'Returns', etc. Values are the text content of each section.
"""
lines = docstring.split("\n")
section_texts = _detection_section(lines)
sections = {"short": section_texts[0]}
for section_text in section_texts:
if len(section_text) <= 1 or "---" not in section_text[1]:
sections["short"] = section_text
else:
sections["".join(section_text[0].split())] = section_text
return sections


def _reindent(string):
"""
Reindent a string by stripping whitespace and normalizing line breaks.

Parameters
----------
string : list of str
The string content to reindent.

Returns
-------
str
Reindented string with normalized line breaks and indentation.
"""
new_string = deepcopy(string)
for i in range(len(new_string)):
new_string[i] = "\n" + new_string[i]
new_string = "".join(new_string)
return "\n".join(l.strip() for l in new_string.strip().split("\n"))


def _aggregate_docstring(list_docstring):
"""
Combine multiple docstrings into a single docstring.

This function takes a list of docstrings, parses each one, and combines them into
a single coherent docstring. It keeps the summary from the first docstring,
combines all parameter sections, and uses the return section from the last docstring.

Parameters
----------
list_docstring : list
List of docstrings to be combined. Each docstring should follow
numpy docstring format.

Returns
-------
doctring: str
A combined docstring containing:
- Summary from first docstring
- Combined parameters from all docstrings
- Returns section from last docstring
The returned docstring is properly reindented.
"""
list_line = []
for index, docstring in enumerate(list_docstring):
if docstring is not None:
list_line.append(_parse_docstring(docstring=docstring))

# add summary
final_docstring = deepcopy(list_line[0]["short"])
# add parameter
final_docstring += list_line[0]["Parameters"]
for i in range(1, len(list_line)):
# add paraemter after remove the title section
final_docstring += list_line[i]["Parameters"][2:]
# the last return
final_docstring += list_line[-1]["Returns"]
return _reindent(final_docstring)
Loading