-
Notifications
You must be signed in to change notification settings - Fork 15
✨Add Schema based CELLxGENE Curator & ✨ enable curation of Feature typed indices #2878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
42e7703
24d0cfd
0082beb
b648c82
6686ee2
0c58eee
8531645
7074712
562d8ab
e96cf3e
3637584
1d468b2
c636a51
51bd805
5e1a7f2
78490c7
c1d0520
b3941d0
18bf0ae
a6dadf9
aecf9a7
f2d6fa0
603c0e4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,3 +41,14 @@ schema_version,entity,organism,source,version | |
5.2.0,Tissue,all,uberon,2024-08-07 | ||
5.2.0,Gene,human,ensembl,release-110 | ||
5.2.0,Gene,mouse,ensembl,release-110 | ||
5.3.0,CellType,all,cl,2025-02-13 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still think the ontology versions information should be parsed from https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.3.0/schema.md directly. Otherwise, we are duplicating the information and also increasing the maintenance of this file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know how to do that reliably... This is a rather free style doc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found https://github.com/chanzuckerberg/cellxgene-ontology-guide/blob/main/ontology-assets/ontology_info.json but it's only for 5.3.0+. It also lacks Genes just like the table in what you've shown. They have a more comprehensive set up to get gene versions. I asked whether they'd be willing to add gene versions. A disadvantage of allowing any version to pulled also causes issues because we have to ensure that all required versions are available in Bionty. Yes, people can technically pull any, but not for all ontologies. |
||
5.3.0,ExperimentalFactor,all,efo,3.75.0 | ||
5.3.0,Ethnicity,human,hancestro,3.0 | ||
5.3.0,DevelopmentalStage,human,hsapdv,2025-01-23 | ||
5.3.0,DevelopmentalStage,mouse,mmusdv,2025-01-23 | ||
5.3.0,Disease,all,mondo,2025-02-04 | ||
5.3.0,Organism,all,ncbitaxon,2024-11-25 | ||
5.3.0,Phenotype,all,pato,2025-02-01 | ||
5.3.0,Tissue,all,uberon,2025-01-15 | ||
5.3.0,Gene,human,ensembl,release-113 | ||
5.3.0,Gene,mouse,ensembl,release-113 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want a Curator anymore, unless I'm wrong. The idea would be to just provide a schema, no? There wouldn't be any specific curator code for CxG, would it?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the Curator needs custom code for missing ontology values (such as "normal"), missing defaults, handling names vs ontology_id, ...
I currently don't think that we can hack all of that into a Schema (some things we can).
I'll first add tests for the low level functionality that I changed/fixed and then I'll open a Slack thread to discuss this topic. Yesterday, I actually had moved a CxG generator function to
lamindb.examples.anndata
just to move it back. I understand what you're looking for.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function that gets the Schema can probably do some of that. I'll revisit this ASAP but first tests!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry that comment was from me)
It should still not have its Curator code, because the code is just to prepare the instance. We need a different strategy to add control terms; or we can just guide users to prepare the instance with a proper notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to have a few schemas available, like cxg, perturbation. Maybe even under
lamindb.schemas.
or something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is legitimate to have some logic in code and others in a schema. Code and schema are quite different. Evidently the former can't be easily queried for because it's not structured and the later can. Not everything has to fit into a schema. Some could -- as Sunny says -- be part of a "curation notebook" which is likely better a curation script which is likely better some kind of configurable thing in lamindb (a "curator", just that this doesn't have so much to do with what we currently brand as curator which is just a thing that validates the schema).