-
Notifications
You must be signed in to change notification settings - Fork 15
✨Add Schema based CELLxGENE Curator & ✨ enable curation of Feature typed indices #2878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2878 +/- ##
==========================================
- Coverage 91.84% 91.51% -0.34%
==========================================
Files 69 71 +2
Lines 10807 11140 +333
==========================================
+ Hits 9926 10195 +269
- Misses 881 945 +64 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Deployment URL: https://76a3a8d2.lamindb.pages.dev |
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
@@ -8,6 +8,7 @@ | |||
MuDataCurator | |||
SpatialDataCurator | |||
TiledbsomaExperimentCurator | |||
CxGCurator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want a Curator anymore, unless I'm wrong. The idea would be to just provide a schema, no? There wouldn't be any specific curator code for CxG, would it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the Curator needs custom code for missing ontology values (such as "normal"), missing defaults, handling names vs ontology_id, ...
I currently don't think that we can hack all of that into a Schema (some things we can).
I'll first add tests for the low level functionality that I changed/fixed and then I'll open a Slack thread to discuss this topic. Yesterday, I actually had moved a CxG generator function to lamindb.examples.anndata
just to move it back. I understand what you're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function that gets the Schema can probably do some of that. I'll revisit this ASAP but first tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry that comment was from me)
It should still not have its Curator code, because the code is just to prepare the instance. We need a different strategy to add control terms; or we can just guide users to prepare the instance with a proper notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to have a few schemas available, like cxg, perturbation. Maybe even under lamindb.schemas.
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is legitimate to have some logic in code and others in a schema. Code and schema are quite different. Evidently the former can't be easily queried for because it's not structured and the later can. Not everything has to fit into a schema. Some could -- as Sunny says -- be part of a "curation notebook" which is likely better a curation script which is likely better some kind of configurable thing in lamindb (a "curator", just that this doesn't have so much to do with what we currently brand as curator which is just a thing that validates the schema).
Is it still possible to move this fix to a new PR? "Fixes a very nasty bug where Schemas that were already saved earlier were returned with changed _aux which led to import field such as the index (via _index_feature_uid) got removed" I think it will make both the fix discussion and the cxg schema discussion more organized in their own PRs. You can re-point this PR to the fix PR. |
@@ -41,3 +41,14 @@ schema_version,entity,organism,source,version | |||
5.2.0,Tissue,all,uberon,2024-08-07 | |||
5.2.0,Gene,human,ensembl,release-110 | |||
5.2.0,Gene,mouse,ensembl,release-110 | |||
5.3.0,CellType,all,cl,2025-02-13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the ontology versions information should be parsed from https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.3.0/schema.md directly. Otherwise, we are duplicating the information and also increasing the maintenance of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to do that reliably... This is a rather free style doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found https://github.com/chanzuckerberg/cellxgene-ontology-guide/blob/main/ontology-assets/ontology_info.json but it's only for 5.3.0+. It also lacks Genes just like the table in what you've shown. They have a more comprehensive set up to get gene versions.
I asked whether they'd be willing to add gene versions.
A disadvantage of allowing any version to pulled also causes issues because we have to ensure that all required versions are available in Bionty. Yes, people can technically pull any, but not for all ontologies.
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Partially fixes #2585
_aux
which led to import field such as the index (via_index_feature_uid
) got removed