[ENH] BEP036 - Phenotypic Data Guidelines #2123

ericearl · 2025-05-30T14:09:19Z

The BEP leads can meet as-needed to discuss this BEP PR

Coordinate a meeting by emailing Eric Earl: [email protected].

Communicate on this PR to provide feedback otherwise.

BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification.

Includes an appendix called phenotype.md
Includes admonitions for the guidelines in-line with modality agnostic files sections

Additional Links

Co-authored-by: Eric Earl [email protected] @ericearl
Co-authored-by: Samuel Guay [email protected] @SamGuay
Co-authored-by: Sebastian Urchs [email protected] @surchs
Co-authored-by: Arshitha Basavaraj [email protected] @Arshitha

Upstream PR

Quick update before merging our PR on surchs fork

BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification. - Includes an appendix called `phenotype.md` - Includes admonitions for the guidelines in-line with modality agnostic files sections --------- Co-authored-by: Eric Earl <[email protected]> Co-authored-by: Samuel Guay <[email protected]> Co-authored-by: Sebastian Urchs <[email protected]> Co-authored-by: Arshitha B <[email protected]>

Changed "e.g." to "for example" to follow contributing style guidelines.

for more information, see https://pre-commit.ci

src/modality-agnostic-files/data-summary-files.md

surchs · 2025-05-30T14:57:31Z

src/appendices/phenotype.md

+each `phenotype/<measurement_tool_name>.json` data dictionary.
+This improves reusability and provides clarity about the measurement tool.
+
+### 5. Use the demographics file for common variables about participants


Copying from https://github.com/surchs/bids-specification/pull/1/files#r2103117486

For this section, would it make sense to suggest that demo-like information be prioritized in this file rather than participants.tsv, making the latter primarily a list of subject IDs? I haven't seen this explicitly addressed anywhere, though I'm unsure if it's something we want to formalize 😬
Something like this could follow the paragraph?:

When all demographic data is stored in phenotype/demographics.tsv, participants.tsv may serve primarily as a minimal listing of subject identifiers with only the participant_id column.

I agree. It'd be good to mention this.

src/appendices/phenotype.md

src/modality-agnostic-files/data-summary-files.md

Put the phenotypic and assessment data content where it belongs.

src/modality-agnostic-files/data-summary-files.md

src/modality-agnostic-files/phenotypic-and-assessment-data.md

src/appendices/phenotype.md

src/schema/objects/files.yaml

Fix `Text` examples to become `tsv` examples with correct tab delimiters.

Correct minor typos in words, headers, and links.

src/appendices/phenotype.md

src/modality-agnostic-files/phenotypic-and-assessment-data.md

Trying to satisfy `remark-lint`.

src/modality-agnostic-files/phenotypic-and-assessment-data.md

Trying to satisfy `remark-lint`.

src/modality-agnostic-files/phenotypic-and-assessment-data.md

Added nav to BEP036 phenotype appendix.

ericearl · 2025-05-30T19:51:48Z

src/appendices/phenotype.md

+| **Column name**  | **Requirement** | **Description** |
+| :--------------- | :-------------- | :-------------- |
+| `participant_id` | REQUIRED        | MUST be the first column in the file.   Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the `participant_id` column will be repeated. |
+| `session _id`    | CONDITIONAL ; If sessions are defined in the dataset | A `session_id` column MUST be added to all tabular files in the phenotype directory as soon as multiple sessions are present in the data set regardless of whether those sessions are in the  `phenotype/` data, `sub-<label>/` data, or a combination of the two. |
+| `run`            | CONDITIONAL ; If there are multiple runs within any session | A chronological `run` number is used when a measurement tool or assessment described by a tabular file was repeated within a session. |
+| `acq_time`       | OPTIONAL        | If acquisition time is available, the `acq_time` column CAN be used to record the time of acquisition of each row in the tabular file. |


@effigies Is there a macro we can use maybe to clean this table up? How would you recommend making this table more manageable?

bids-specification/src/schema/rules/tabular_data/modality_agnostic.yaml

Lines 2 to 19 in ac11483

Participants:

selectors:

- path == "/participants.tsv"

initial_columns:

- participant_id

columns:

participant_id:

level: required

description_addendum: |

There MUST be exactly one row for each participant.

species: recommended

age: recommended

sex: recommended

handedness: recommended

strain: recommended

strain_rrid: recommended

index_columns: [participant_id]

additional_columns: allowed

bids-specification/src/modality-agnostic-files/data-summary-files.md

Lines 22 to 28 in ac11483



{{ MACROS___make_columns_table("modality_agnostic.Participants") }}

codecov · 2025-05-30T19:53:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.06%. Comparing base (ac11483) to head (e62b5cc).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2123   +/-   ##
=======================================
  Coverage   82.06%   82.06%           
=======================================
  Files          17       17           
  Lines        1533     1533           
=======================================
  Hits         1258     1258           
  Misses        275      275

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Fixing a missed line break.

This is my first attempt. Hopefully it works?

Attempt 2 to satisfy CircleCI.

Attempt 3 to satisfy CircleCI.

Attempt 4? to make macro table happy in the schema.

Attempt 5? to satisfy CircleCI, etc.

ericearl · 2025-06-24T19:18:28Z

src/schema/rules/tabular_data/modality_agnostic.yaml

+Phenotypes:
+  selectors:
+    - extension == ".tsv"
+  initial_columns:
+    - participant_id
+  columns:
+    participant_id:
+      level: required
+      description_addendum: |
+        MUST be the first column in the file.
+        Note that data for one participant MAY be represented across multiple rows
+        in case of multiple sessions or runs, and
+        therefore the entry in the `participant_id` column will be repeated.
+    session_id:
+      level: optional
+      description_addendum: |
+        REQUIRED if sessions are defined in the dataset.
+        A `session_id` column MUST be added to all tabular files in the phenotype directory
+        as soon as multiple sessions are present in the data set
+        regardless of whether those sessions are in the
+        `phenotype/` data, `sub-<label>/` data, or a combination of the two.
+    run:
+      level: optional
+      description_addendum: |
+        REQUIRED if there are multiple runs within any session.
+        A chronological `run` number is used when
+        a measurement tool or assessment described by a tabular file
+        was repeated within a session.
+    acq_time:
+      level: optional
+      description_addendum: |
+        If acquisition time is available, the `acq_time` column CAN be used
+        to record the time of acquisition of each row in the tabular file.
+  index_columns: [participant_id, session_id, run, acq_time]
+  additional_columns: allowed
+


@effigies or @rwblair: What's wrong with this picture? Please help oh lads of the schema.

ericearl and others added 4 commits May 20, 2025 08:24

Merge pull request #2 from bids-standard/master

3cedc86

Upstream PR

Merge pull request #3 from bids-standard/master

11fbb47

Quick update before merging our PR on surchs fork

Update phenotype.md and data-summary-files.md

0a640e6

Changed "e.g." to "for example" to follow contributing style guidelines.

ericearl requested review from effigies and rwblair May 30, 2025 14:09

ericearl assigned surchs, ericearl and SamGuay May 30, 2025

ericearl requested review from erdalkaraca and DimitriPapadopoulos as code owners May 30, 2025 14:09

ericearl added enhancement New feature or request BEP phenotype labels May 30, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

a19512b

for more information, see https://pre-commit.ci

effigies reviewed May 30, 2025

View reviewed changes

src/modality-agnostic-files/data-summary-files.md Outdated Show resolved Hide resolved

src/modality-agnostic-files/data-summary-files.md Outdated Show resolved Hide resolved

surchs reviewed May 30, 2025

View reviewed changes