Skip to content

Generate Bins on ExpressionProfile after Finding Gene Intersection #34

@noamteyssier

Description

@noamteyssier

This is to address the conversation from @artemy-bakulin original commit fe68880
Artemy brings up the point about class imbalance here: https://github.com/noamteyssier/pypage/issues/33#issuecomment-1167958248

I agree with the new order of operations, but we need to address the following bug in this commits current form.

The Bug

The current form will return a bin_array of size n_genes regardless of the size of the gene subset provided.

currently fails the following test:

N_GENES=1000
T = 100

def get_expression() -> (np.ndarray, np.ndarray):
    genes = np.array([f"g.{g}" for g in np.arange(N_GENES)])
    scores = np.random.normal(size=N_GENES)
    return genes, scores

def test_subsetting():
    for _ in np.arange(T):
        genes, expression = get_expression()

        exp = ExpressionProfile(genes, expression)
        subset = genes[np.random.random(genes.size) < 0.5]

        bin_sub = exp.get_gene_subset(subset)
        assert bin_sub.size == subset.size

Solution

Could be fixed by adjusting _build_bool_array or _build_bin_array by subsetting those with unset indices (initializing bool_array with np.full(-1) instead of np.zeros)

Leaving this open for now, and will circle back once the rest of the merge is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgorithmicAlgorithmic ImprovementsbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions