Skip to content

[ENH] data-type recommendation based on simplicity. #2135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 68 additions & 1 deletion src/common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -863,7 +863,74 @@ A guide for using macros can be found at
}
}
}
) }}
) }}

## Data type and modality selection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the intent here! To align with your proposal to introduce a new common principle, could we retitle this heading so it directly reflects the scope? For example:

Suggested change
## Data type and modality selection
## BIDS' scope


BIDS is primarily for brain-related data, which should be interpreted as a research perspective rather than an inherent property of the data.
For example, motion is a data type in BIDS because it informs brain activity research,
while eye-tracking data in BIDS serves brain research perspectives.
As long as data serves as a window to the brain or supports brain-related research objectives,
it falls within BIDS' scope.
Comment on lines +870 to +874
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining the project scope as a core principle is a great idea—well spotted! To keep everything consistent, could we move this definition into common-principles.yaml? Once it’s in place, we can fine-tune the exact wording alongside the other principles.


When developing BIDS Extension Proposals (BEPs) that introduce new types of data,
it is important to determine whether the data should:

1. Have its own dedicated data type and modality
2. Be incorporated under an existing data type
3. Be embedded within other modalities when appropriate
Comment on lines +876 to +881
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this content serves more as a BEP guideline on how to implement the scope principle adequately. To maintain clear boundaries, let’s relocate it to the bids-standard/bids-website once the new scope definition is accepted. We can then adapt the wording to fit the BEP guidelines format.


### Simplicity principle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want “Simplicity” as a common principle, I suggest adding it to common-principles.yaml with a concise, broad definition. In adding new common principles, we must make sure we have evaluated the overlap/collision/compatibility with the existing principles (e.g., this one could possibly be overlaping the 80/20 rule)


Decisions about data organization SHOULD be guided by a simplicity principle
that considers how easy it is to curate, manage, and reuse data.
This principle can be quantified through the following criteria:

1. **Minimize file requirements**: Reduce the number of files needed (excluding sidecars) to describe new data

1. **Maximize reuse**: Keep data types, modalities, and entities minimal while reusing existing structures for new data
(metadata fields may change when new data is added to the specification)

1. **Avoid data type congestion**: The majority of data files in a host data type directory
should be directly related to that data type's primary purpose

1. **Balance reuse and congestion**: Balance the benefits of reusing existing structures against
the risk of overcrowding a data type with unrelated files

1. **Maintain data coherence**: Keep data with shared characteristics
(common acquisition clock, recording instrument, coordinate system) in one file
unless specific features (such as task, run, or data source) warrant separation
Comment on lines +885 to +902
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous point, this section reads more like governance guidance than a spec requirement. Perhaps we could move it into the governance documentation, keeping the spec text lean and focused on mandatory details.


### Application guidelines

The simplicity principle SHOULD be applied early in BEP development based on:

- **File count analysis**: Consider the total number of files required to represent the data
and how this affects dataset organization

- **Usage patterns**: Evaluate whether the data is commonly used standalone
or primarily as auxiliary measurements alongside other data types

- **Technical requirements**: Assess whether the data requires specific metadata fields,
coordinate systems, or other specifications that align with existing structures

- **Community perspective**: Consider both data curator and end-user perspectives
when evaluating organizational approaches
Comment on lines +904 to +918
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly guidelines. Should not be in the spec but in the guidelines.


### Examples

**Motion data**: Uses three core files (data, channels, events) plus sidecars,
introduces one data type and essentially one entity while reusing important resources,
avoiding congestion in other data types.

**Electrocardiogram (EKG) data**: Can serve as an example of flexible placement options:

- As a channel in EEG recordings when recorded with the same instrument
- As a physio modality in a host EEG data type for standalone EKG with different instruments
- In a dedicated physio data type (once physio-BEP is merged) when multiple physiological recordings
(eye-tracking, galvanic skin response, metabolics, SpO2, EKG) are present

The simplicity principle guides the recommendation based on the specific dataset context and usage requirements.
Comment on lines +920 to +933
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For maximum clarity, spec examples should show actual file structures, dataset layouts, or encoding snippets. Since this is primarily illustrative guidance, it may fit better within the BEP guidelines document rather than in the core spec.


## Participant names and other labels

Expand Down