Skip to content

Conversation

@joyceyan
Copy link
Contributor

@joyceyan joyceyan commented Aug 22, 2025

Reason for Change

https://czi.atlassian.net/browse/VC-3425

Changes

obs (Cell metadata)

  • Updated the requirements for development_stage to require "na" when the development_stage_ontology_term_id is "na"
  • Updated the requirements for development_stage_ontology_term_id to require "na" when the tissue_type is "cell line"

note that in order to do this, i had to introduce a new keyword paradigm in the column dependencies, exclude_exact and exclude_ancestors_inclusive, which are more or less the opposite of the existing match_exact and match_ancestors_inclusive. this is because when i first added the two tissue_type column dependency rules (one to require that development_stage_ontology_term_id if tissue_type = 'na', and the other to require that na is not allowed for development_stage_ontology_term_id if tissue_type is not na, this meant that the rule enforcing "if organism is none of the above" wouldn't kick in in the case where say, tissue_type = tissue and organism = not human, mouse, zebrafish, fruit fly, or roundworm because that rule only kicks in if none of the column-based rules match.

i think this also helps make the schema definition more explicitly clear what the conditions are for the rule kicking in in the "else" case. another option i considered to handle this would be to rewrite the schema language such that for column dependencies, we have if, elif and else logic built in more explicitly rather than just if logic. but that would be a bit more of an involved change with updating the _validate_column_dependencies function to parse through that language and i think that's out of scope for now.

Testing

  • added the unit tests test_cell_line_development_stage_ontology_term_id and test_cell_line_cannot_be_na_for_tissue

Notes for Reviewer

@joyceyan joyceyan force-pushed the joyce/dev-stage-cell-line branch from e8eb2f4 to cb62ed2 Compare August 22, 2025 19:14
@codecov
Copy link

codecov bot commented Aug 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.06%. Comparing base (3cac00e) to head (da81496).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1428      +/-   ##
==========================================
+ Coverage   88.99%   89.06%   +0.06%     
==========================================
  Files          23       23              
  Lines        2580     2596      +16     
==========================================
+ Hits         2296     2312      +16     
  Misses        284      284              
Components Coverage Δ
cellxgene_schema_cli 89.57% <100.00%> (+0.08%) ⬆️
migration_assistant 91.26% <ø> (ø)
schema_bump_dry_run_genes 79.74% <ø> (ø)
schema_bump_dry_run_ontologies 99.51% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joyceyan joyceyan changed the title feat: update development_stage_ontology_term_id and development_stage for 7.0 [wip] feat: update development_stage_ontology_term_id and development_stage for 7.0 Aug 22, 2025
@joyceyan joyceyan force-pushed the joyce/dev-stage-cell-line branch from cb62ed2 to 8b782d4 Compare August 25, 2025 19:02
@joyceyan joyceyan requested review from Bento007 and kliu-czi August 26, 2025 15:25
type: curie
curie_constraints:
ontologies:
- NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be lowercase na?

Copy link
Contributor Author

@joyceyan joyceyan Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would also work if we changed it to na, but i think by convention, we just use capitalized NA to make it consistent with other real ontologies like UBERON or CL. there's a few other fields in the schema definition where we use NA as the ontology placeholder for the string na

Copy link
Contributor

@kliu-czi kliu-czi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have a definition of what na is in this universe?

@joyceyan
Copy link
Contributor Author

Do we need to have a definition of what na is in this universe?

it just means "not applicable", which is different from "unknown" even though they might seem similar. this particular example is a bit hard to understand unless you already know what a cell line is, but if you take a look at the definition for self_reported_ethnicity_ontology_term_id it's easier to understand the difference between the two. so if the organism type isn't a human and it's a mouse cell, then it doesn't have a self reported ethnicity, therefore, it must be set to na. if it is human / homo sapiens, then unknown is allowed if that data just wasn't collected.

@joyceyan joyceyan merged commit aba4ed8 into main Aug 26, 2025
17 checks passed
@joyceyan joyceyan deleted the joyce/dev-stage-cell-line branch August 26, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants