-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
AlgorithmicAlgorithmic ImprovementsAlgorithmic ImprovementsbugSomething isn't workingSomething isn't working
Description
This is to address the conversation from @artemy-bakulin original commit fe68880
Artemy brings up the point about class imbalance here: https://github.com/noamteyssier/pypage/issues/33#issuecomment-1167958248
I agree with the new order of operations, but we need to address the following bug in this commits current form.
The Bug
The current form will return a bin_array
of size n_genes
regardless of the size of the gene subset provided.
currently fails the following test:
N_GENES=1000
T = 100
def get_expression() -> (np.ndarray, np.ndarray):
genes = np.array([f"g.{g}" for g in np.arange(N_GENES)])
scores = np.random.normal(size=N_GENES)
return genes, scores
def test_subsetting():
for _ in np.arange(T):
genes, expression = get_expression()
exp = ExpressionProfile(genes, expression)
subset = genes[np.random.random(genes.size) < 0.5]
bin_sub = exp.get_gene_subset(subset)
assert bin_sub.size == subset.size
Solution
Could be fixed by adjusting _build_bool_array
or _build_bin_array
by subsetting those with unset indices (initializing bool_array
with np.full(-1)
instead of np.zeros
)
Leaving this open for now, and will circle back once the rest of the merge is complete
Metadata
Metadata
Assignees
Labels
AlgorithmicAlgorithmic ImprovementsAlgorithmic ImprovementsbugSomething isn't workingSomething isn't working