Skip to content

[ENH] data-type recommendation based on simplicity. #2135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

neuromechanist
Copy link
Member

@neuromechanist neuromechanist commented Jun 10, 2025

Following a discussion in the BIDS maintainer meeting on June 10 on #2108 (defining criteria for data types and modalities), @yarikoptic suggested creating proposals to be considered by the community.

This proposal is compatible with the decisions made so far, and might help lay a path forward.

I formed my response to the issue as a dark PR for better visibility.

Please make suggestion, changes and edits regarding this proposal here. Please keep the general discussion under #2108.

Copy link

codecov bot commented Jun 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.15%. Comparing base (daad867) to head (2f4549e).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2135      +/-   ##
==========================================
- Coverage   82.19%   82.15%   -0.05%     
==========================================
  Files          17       17              
  Lines        1528     1530       +2     
==========================================
+ Hits         1256     1257       +1     
- Misses        272      273       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this proposal together! I’ve noticed that some sections mix the formal specification with decision-making and guideline material, which can make it harder for readers to distinguish between what must be implemented and how the BEP process works.

Proposed approach:

  1. Keep the spec focused on technical requirements and concrete examples.
  2. If common principles should require modifications (e.g., defining BIDS' scope precisely or the proposed simplicity principle), propose them as such, and within the common-principles.yaml file of the spec, addressing potential interaction/conflict with existing principles.
  3. Move governance and guideline content (BEP-specific advice) into their appropriate home (bids-standard/bids-website, see bids-standard/bids-website#668)

I appreciate the thoughtful suggestions here and believe this separation will make both the spec and the guiding documentation clearer and easier to maintain.

Comment on lines +870 to +874
BIDS is primarily for brain-related data, which should be interpreted as a research perspective rather than an inherent property of the data.
For example, motion is a data type in BIDS because it informs brain activity research,
while eye-tracking data in BIDS serves brain research perspectives.
As long as data serves as a window to the brain or supports brain-related research objectives,
it falls within BIDS' scope.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining the project scope as a core principle is a great idea—well spotted! To keep everything consistent, could we move this definition into common-principles.yaml? Once it’s in place, we can fine-tune the exact wording alongside the other principles.

Comment on lines +885 to +902
Decisions about data organization SHOULD be guided by a simplicity principle
that considers how easy it is to curate, manage, and reuse data.
This principle can be quantified through the following criteria:

1. **Minimize file requirements**: Reduce the number of files needed (excluding sidecars) to describe new data

1. **Maximize reuse**: Keep data types, modalities, and entities minimal while reusing existing structures for new data
(metadata fields may change when new data is added to the specification)

1. **Avoid data type congestion**: The majority of data files in a host data type directory
should be directly related to that data type's primary purpose

1. **Balance reuse and congestion**: Balance the benefits of reusing existing structures against
the risk of overcrowding a data type with unrelated files

1. **Maintain data coherence**: Keep data with shared characteristics
(common acquisition clock, recording instrument, coordinate system) in one file
unless specific features (such as task, run, or data source) warrant separation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous point, this section reads more like governance guidance than a spec requirement. Perhaps we could move it into the governance documentation, keeping the spec text lean and focused on mandatory details.

2. Be incorporated under an existing data type
3. Be embedded within other modalities when appropriate

### Simplicity principle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want “Simplicity” as a common principle, I suggest adding it to common-principles.yaml with a concise, broad definition. In adding new common principles, we must make sure we have evaluated the overlap/collision/compatibility with the existing principles (e.g., this one could possibly be overlaping the 80/20 rule)

Comment on lines +904 to +918
### Application guidelines

The simplicity principle SHOULD be applied early in BEP development based on:

- **File count analysis**: Consider the total number of files required to represent the data
and how this affects dataset organization

- **Usage patterns**: Evaluate whether the data is commonly used standalone
or primarily as auxiliary measurements alongside other data types

- **Technical requirements**: Assess whether the data requires specific metadata fields,
coordinate systems, or other specifications that align with existing structures

- **Community perspective**: Consider both data curator and end-user perspectives
when evaluating organizational approaches
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly guidelines. Should not be in the spec but in the guidelines.

Comment on lines +920 to +933
### Examples

**Motion data**: Uses three core files (data, channels, events) plus sidecars,
introduces one data type and essentially one entity while reusing important resources,
avoiding congestion in other data types.

**Electrocardiogram (EKG) data**: Can serve as an example of flexible placement options:

- As a channel in EEG recordings when recorded with the same instrument
- As a physio modality in a host EEG data type for standalone EKG with different instruments
- In a dedicated physio data type (once physio-BEP is merged) when multiple physiological recordings
(eye-tracking, galvanic skin response, metabolics, SpO2, EKG) are present

The simplicity principle guides the recommendation based on the specific dataset context and usage requirements.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For maximum clarity, spec examples should show actual file structures, dataset layouts, or encoding snippets. Since this is primarily illustrative guidance, it may fit better within the BEP guidelines document rather than in the core spec.

) }}
) }}

## Data type and modality selection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the intent here! To align with your proposal to introduce a new common principle, could we retitle this heading so it directly reflects the scope? For example:

Suggested change
## Data type and modality selection
## BIDS' scope

Comment on lines +876 to +881
When developing BIDS Extension Proposals (BEPs) that introduce new types of data,
it is important to determine whether the data should:

1. Have its own dedicated data type and modality
2. Be incorporated under an existing data type
3. Be embedded within other modalities when appropriate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this content serves more as a BEP guideline on how to implement the scope principle adequately. To maintain clear boundaries, let’s relocate it to the bids-standard/bids-website once the new scope definition is accepted. We can then adapt the wording to fit the BEP guidelines format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants