Skip to content

Genotype call array count alleles #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alimanfoo opened this issue Jun 18, 2020 · 6 comments
Closed

Genotype call array count alleles #3

alimanfoo opened this issue Jun 18, 2020 · 6 comments
Labels
core operations Issues related to domain-specific functionality such as LD pruning, PCA, association testing, etc. dispatching Issues related to how we send method calls to different backends

Comments

@alimanfoo
Copy link
Collaborator

alimanfoo commented Jun 18, 2020

Raising this issue to propose implementing a function to perform an allele count on a genotype array.

The input would be a genotype call array.

The output would be an allele count array.

The function summarises information across samples and ploidy dimensions, counting the number of observations of each allele index (0, 1, 2, etc.) for each variant.

@alimanfoo
Copy link
Collaborator Author

In the skallel v2 prototype I implemented this as a function called genotypes_count_alleles, with backends for numpy, dask and cuda.

I also did some benchmarking of the different backends. Note that for this function a cuda backend does not add any significant speedup over cpu because this is a low complexity computation, it is mostly I/O-bound. However, because of its simplicity it may be worth implementing a cuda backend just to help work through how we support cuda backends in general.

@alimanfoo
Copy link
Collaborator Author

For reference, this was implemented in scikit-allel as GenotypeArray.count_alleles().

@alimanfoo alimanfoo changed the title Genotype array count alleles Genotype call array count alleles Jun 18, 2020
@tomwhite
Copy link
Collaborator

tomwhite commented Jul 6, 2020

@alimanfoo are you working on this? I'm happy to put something together for discussion. (I added a few related comments in #11)

@alimanfoo
Copy link
Collaborator Author

Hi @tomwhite, very happy if you want to put something together here. (Otherwise it's probably the first thing I'd ask quansight to look at.)

@tomwhite
Copy link
Collaborator

tomwhite commented Jul 9, 2020

Thanks @alimanfoo.

It looks like the actual method code has been written by @eric-czech in https://github.com/pystatgen/sgkit/issues/29#issuecomment-656071795. So the remaining work (if we are broadly happy with that API) is to think about dispatch. For that I think the method functions can take an optional backend parameter, as discussed in #11.

@hammer hammer added core operations Issues related to domain-specific functionality such as LD pruning, PCA, association testing, etc. dispatching Issues related to how we send method calls to different backends labels Jul 9, 2020
@hammer
Copy link
Contributor

hammer commented Aug 5, 2020

@alimanfoo @tomwhite now that #36 has been merged, can we close this issue and move discussion of performance and dispatch to more specific issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core operations Issues related to domain-specific functionality such as LD pruning, PCA, association testing, etc. dispatching Issues related to how we send method calls to different backends
Projects
None yet
Development

No branches or pull requests

3 participants