-
Notifications
You must be signed in to change notification settings - Fork 182
[ENH] data-type recommendation based on simplicity. #2135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[ENH] data-type recommendation based on simplicity. #2135
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2135 +/- ##
==========================================
- Coverage 82.19% 82.15% -0.05%
==========================================
Files 17 17
Lines 1528 1530 +2
==========================================
+ Hits 1256 1257 +1
- Misses 272 273 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting this proposal together! I’ve noticed that some sections mix the formal specification with decision-making and guideline material, which can make it harder for readers to distinguish between what must be implemented and how the BEP process works.
Proposed approach:
- Keep the spec focused on technical requirements and concrete examples.
- If common principles should require modifications (e.g., defining BIDS' scope precisely or the proposed simplicity principle), propose them as such, and within the
common-principles.yaml
file of the spec, addressing potential interaction/conflict with existing principles. - Move governance and guideline content (BEP-specific advice) into their appropriate home (bids-standard/bids-website, see bids-standard/bids-website#668)
I appreciate the thoughtful suggestions here and believe this separation will make both the spec and the guiding documentation clearer and easier to maintain.
BIDS is primarily for brain-related data, which should be interpreted as a research perspective rather than an inherent property of the data. | ||
For example, motion is a data type in BIDS because it informs brain activity research, | ||
while eye-tracking data in BIDS serves brain research perspectives. | ||
As long as data serves as a window to the brain or supports brain-related research objectives, | ||
it falls within BIDS' scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining the project scope as a core principle is a great idea—well spotted! To keep everything consistent, could we move this definition into common-principles.yaml
? Once it’s in place, we can fine-tune the exact wording alongside the other principles.
Decisions about data organization SHOULD be guided by a simplicity principle | ||
that considers how easy it is to curate, manage, and reuse data. | ||
This principle can be quantified through the following criteria: | ||
|
||
1. **Minimize file requirements**: Reduce the number of files needed (excluding sidecars) to describe new data | ||
|
||
1. **Maximize reuse**: Keep data types, modalities, and entities minimal while reusing existing structures for new data | ||
(metadata fields may change when new data is added to the specification) | ||
|
||
1. **Avoid data type congestion**: The majority of data files in a host data type directory | ||
should be directly related to that data type's primary purpose | ||
|
||
1. **Balance reuse and congestion**: Balance the benefits of reusing existing structures against | ||
the risk of overcrowding a data type with unrelated files | ||
|
||
1. **Maintain data coherence**: Keep data with shared characteristics | ||
(common acquisition clock, recording instrument, coordinate system) in one file | ||
unless specific features (such as task, run, or data source) warrant separation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous point, this section reads more like governance guidance than a spec requirement. Perhaps we could move it into the governance documentation, keeping the spec text lean and focused on mandatory details.
2. Be incorporated under an existing data type | ||
3. Be embedded within other modalities when appropriate | ||
|
||
### Simplicity principle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want “Simplicity” as a common principle, I suggest adding it to common-principles.yaml
with a concise, broad definition. In adding new common principles, we must make sure we have evaluated the overlap/collision/compatibility with the existing principles (e.g., this one could possibly be overlaping the 80/20 rule)
### Application guidelines | ||
|
||
The simplicity principle SHOULD be applied early in BEP development based on: | ||
|
||
- **File count analysis**: Consider the total number of files required to represent the data | ||
and how this affects dataset organization | ||
|
||
- **Usage patterns**: Evaluate whether the data is commonly used standalone | ||
or primarily as auxiliary measurements alongside other data types | ||
|
||
- **Technical requirements**: Assess whether the data requires specific metadata fields, | ||
coordinate systems, or other specifications that align with existing structures | ||
|
||
- **Community perspective**: Consider both data curator and end-user perspectives | ||
when evaluating organizational approaches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearly guidelines. Should not be in the spec but in the guidelines.
### Examples | ||
|
||
**Motion data**: Uses three core files (data, channels, events) plus sidecars, | ||
introduces one data type and essentially one entity while reusing important resources, | ||
avoiding congestion in other data types. | ||
|
||
**Electrocardiogram (EKG) data**: Can serve as an example of flexible placement options: | ||
|
||
- As a channel in EEG recordings when recorded with the same instrument | ||
- As a physio modality in a host EEG data type for standalone EKG with different instruments | ||
- In a dedicated physio data type (once physio-BEP is merged) when multiple physiological recordings | ||
(eye-tracking, galvanic skin response, metabolics, SpO2, EKG) are present | ||
|
||
The simplicity principle guides the recommendation based on the specific dataset context and usage requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For maximum clarity, spec examples should show actual file structures, dataset layouts, or encoding snippets. Since this is primarily illustrative guidance, it may fit better within the BEP guidelines document rather than in the core spec.
) }} | ||
) }} | ||
|
||
## Data type and modality selection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying the intent here! To align with your proposal to introduce a new common principle, could we retitle this heading so it directly reflects the scope? For example:
## Data type and modality selection | |
## BIDS' scope |
When developing BIDS Extension Proposals (BEPs) that introduce new types of data, | ||
it is important to determine whether the data should: | ||
|
||
1. Have its own dedicated data type and modality | ||
2. Be incorporated under an existing data type | ||
3. Be embedded within other modalities when appropriate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this content serves more as a BEP guideline on how to implement the scope principle adequately. To maintain clear boundaries, let’s relocate it to the bids-standard/bids-website once the new scope definition is accepted. We can then adapt the wording to fit the BEP guidelines format.
Following a discussion in the BIDS maintainer meeting on June 10 on #2108 (defining criteria for data types and modalities), @yarikoptic suggested creating proposals to be considered by the community.
This proposal is compatible with the decisions made so far, and might help lay a path forward.
I formed my response to the issue as a dark PR for better visibility.
Please make suggestion, changes and edits regarding this proposal here. Please keep the general discussion under #2108.
Note
The rendered version is available here: https://bids-specification--2135.org.readthedocs.build/en/2135/common-principles.html#data-type-and-modality-selection