-
Notifications
You must be signed in to change notification settings - Fork 35
Genotype call array count alleles #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In the skallel v2 prototype I implemented this as a function called genotypes_count_alleles, with backends for numpy, dask and cuda. I also did some benchmarking of the different backends. Note that for this function a cuda backend does not add any significant speedup over cpu because this is a low complexity computation, it is mostly I/O-bound. However, because of its simplicity it may be worth implementing a cuda backend just to help work through how we support cuda backends in general. |
For reference, this was implemented in scikit-allel as GenotypeArray.count_alleles(). |
@alimanfoo are you working on this? I'm happy to put something together for discussion. (I added a few related comments in #11) |
Hi @tomwhite, very happy if you want to put something together here. (Otherwise it's probably the first thing I'd ask quansight to look at.) |
Thanks @alimanfoo. It looks like the actual method code has been written by @eric-czech in https://github.com/pystatgen/sgkit/issues/29#issuecomment-656071795. So the remaining work (if we are broadly happy with that API) is to think about dispatch. For that I think the method functions can take an optional |
@alimanfoo @tomwhite now that #36 has been merged, can we close this issue and move discussion of performance and dispatch to more specific issues? |
Raising this issue to propose implementing a function to perform an allele count on a genotype array.
The input would be a genotype call array.
The output would be an allele count array.
The function summarises information across samples and ploidy dimensions, counting the number of observations of each allele index (0, 1, 2, etc.) for each variant.
The text was updated successfully, but these errors were encountered: