-
Notifications
You must be signed in to change notification settings - Fork 22
[r] add iterative lsi design docs #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: design-docs
Are you sure you want to change the base?
Conversation
Thanks for this update, very clearly written Discussion points:
Comments:
|
For your second point comment point, I think my interpretation from that conversation was that the clustering would be separate, but we would still pseudobulk within funciton. By all means, we can take that out and put it into the wrapper iterative lsi function though! Do you think it would make sense to do variance and dispersion as just a parameter in the same function instead, rather than separate functions? Overall I agree with your other points. Will reflect here soon! |
I was originally thinking clustering + pseudobulk calculation could happen in iterative_lsi. Then we still have a parameter in iterative_lsi to let the user configure how feature selection happens (which could be e.g. |
Pretty much as we discussed during call!
I think the biggest point of contention is the normalization structure. Normalizations like tf-idf, Z-score norms can have data that they will be fit to. However, the biggest problem is that we want something that can interoperate with BPCells operations, while also return the calculated information (mean, variance, idf). Should it follow the same styling as the S3 class for LSI that we are creating, with
cell.embeddings
/feature.loadings
?I propose just having a boolean param, with the default returning an IterableMatrix, and the other being an option to return a class that we can project with.