-
Notifications
You must be signed in to change notification settings - Fork 182
[ENH] BEP036 - Phenotypic Data Guidelines #2123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Upstream PR
Quick update before merging our PR on surchs fork
BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification. - Includes an appendix called `phenotype.md` - Includes admonitions for the guidelines in-line with modality agnostic files sections --------- Co-authored-by: Eric Earl <[email protected]> Co-authored-by: Samuel Guay <[email protected]> Co-authored-by: Sebastian Urchs <[email protected]> Co-authored-by: Arshitha B <[email protected]>
Changed "e.g." to "for example" to follow contributing style guidelines.
for more information, see https://pre-commit.ci
each `phenotype/<measurement_tool_name>.json` data dictionary. | ||
This improves reusability and provides clarity about the measurement tool. | ||
|
||
### 5. Use the demographics file for common variables about participants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying from https://github.com/surchs/bids-specification/pull/1/files#r2103117486
For this section, would it make sense to suggest that demo-like information be prioritized in this file rather than participants.tsv
, making the latter primarily a list of subject IDs? I haven't seen this explicitly addressed anywhere, though I'm unsure if it's something we want to formalize 😬
Something like this could follow the paragraph?:
When all demographic data is stored in
phenotype/demographics.tsv
,participants.tsv
may serve primarily as a minimal listing of subject identifiers with only theparticipant_id
column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It'd be good to mention this.
Put the phenotypic and assessment data content where it belongs.
Fix `Text` examples to become `tsv` examples with correct tab delimiters.
Correct minor typos in words, headers, and links.
Trying to satisfy `remark-lint`.
Trying to satisfy `remark-lint`.
Added nav to BEP036 phenotype appendix.
src/appendices/phenotype.md
Outdated
| **Column name** | **Requirement** | **Description** | | ||
| :--------------- | :-------------- | :-------------- | | ||
| `participant_id` | REQUIRED | MUST be the first column in the file. Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the `participant_id` column will be repeated. | | ||
| `session _id` | CONDITIONAL ; If sessions are defined in the dataset | A `session_id` column MUST be added to all tabular files in the phenotype directory as soon as multiple sessions are present in the data set regardless of whether those sessions are in the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | | ||
| `run` | CONDITIONAL ; If there are multiple runs within any session | A chronological `run` number is used when a measurement tool or assessment described by a tabular file was repeated within a session. | | ||
| `acq_time` | OPTIONAL | If acquisition time is available, the `acq_time` column CAN be used to record the time of acquisition of each row in the tabular file. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies Is there a macro we can use maybe to clean this table up? How would you recommend making this table more manageable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Participants: | |
selectors: | |
- path == "/participants.tsv" | |
initial_columns: | |
- participant_id | |
columns: | |
participant_id: | |
level: required | |
description_addendum: | | |
There MUST be exactly one row for each participant. | |
species: recommended | |
age: recommended | |
sex: recommended | |
handedness: recommended | |
strain: recommended | |
strain_rrid: recommended | |
index_columns: [participant_id] | |
additional_columns: allowed |
bids-specification/src/modality-agnostic-files/data-summary-files.md
Lines 22 to 28 in ac11483
<!-- This block generates a columns table. | |
The definitions of these fields can be found in | |
src/schema/rules/tabular_data/*.yaml | |
and a guide for using macros can be found at | |
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md | |
--> | |
{{ MACROS___make_columns_table("modality_agnostic.Participants") }} |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2123 +/- ##
=======================================
Coverage 82.06% 82.06%
=======================================
Files 17 17
Lines 1533 1533
=======================================
Hits 1258 1258
Misses 275 275 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fixing a missed line break.
This is my first attempt. Hopefully it works?
Attempt 2 to satisfy CircleCI.
Attempt 3 to satisfy CircleCI.
Attempt 4? to make macro table happy in the schema.
Attempt 5? to satisfy CircleCI, etc.
Phenotypes: | ||
selectors: | ||
- extension == ".tsv" | ||
initial_columns: | ||
- participant_id | ||
columns: | ||
participant_id: | ||
level: required | ||
description_addendum: | | ||
MUST be the first column in the file. | ||
Note that data for one participant MAY be represented across multiple rows | ||
in case of multiple sessions or runs, and | ||
therefore the entry in the `participant_id` column will be repeated. | ||
session_id: | ||
level: optional | ||
description_addendum: | | ||
REQUIRED if sessions are defined in the dataset. | ||
A `session_id` column MUST be added to all tabular files in the phenotype directory | ||
as soon as multiple sessions are present in the data set | ||
regardless of whether those sessions are in the | ||
`phenotype/` data, `sub-<label>/` data, or a combination of the two. | ||
run: | ||
level: optional | ||
description_addendum: | | ||
REQUIRED if there are multiple runs within any session. | ||
A chronological `run` number is used when | ||
a measurement tool or assessment described by a tabular file | ||
was repeated within a session. | ||
acq_time: | ||
level: optional | ||
description_addendum: | | ||
If acquisition time is available, the `acq_time` column CAN be used | ||
to record the time of acquisition of each row in the tabular file. | ||
index_columns: [participant_id, session_id, run, acq_time] | ||
additional_columns: allowed | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BEP leads can meet as-needed to discuss this BEP PR
Coordinate a meeting by emailing Eric Earl: [email protected].
Communicate on this PR to provide feedback otherwise.
HTML preview of this BEP
BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification.
phenotype.md
Additional Links
Co-authored-by: Eric Earl [email protected] @ericearl
Co-authored-by: Samuel Guay [email protected] @SamGuay
Co-authored-by: Sebastian Urchs [email protected] @surchs
Co-authored-by: Arshitha Basavaraj [email protected] @Arshitha