Skip to content

Mlynarski-Group/Allen_visual_neuropixel_data_binned

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Target format

  • One h5 file per stimulus type
  • Within each h5 file, different 3D tensors for each brain area and each session
  • 3D tensor shape: neurons x time x trials
  • Spiking data in binary format for 20 ms bins

Output files' structure

/ {stimulus_type}.h5
    / session_{session_id}
        / {brain_structure}
            / spike_data: xarray.DataArray (presentation_id x unit_id x time)
            / speed: xarray.DataArray (presentation_id x time)
            / pupil_data: xarray.DataSet (presentation_id x time): pupil_variables
            / presentations: xarray.DataSet (presentation_id):
                ['stimulus_block', 'start_time', 'stop_time', 'duration']
                + relevant_stimulus_parameters

Relevant stimulus parameters

relevant_stimulus_parameters = {
    'gabors': ['orientation', 'y_position', 'x_position'],
    'flashes': ['color'],
    'drifting_gratings': ['temporal_frequency', 'orientation'],
    'static_gratings': ['phase', 'spatial_frequency', 'orientation'],
    'natural_scenes': ['frame'],
    'natural_movie_one': ['frame'],
    'natural_movie_three': ['frame'],
    'natural_movie_one_more_repeats': ['frame'],
    'drifting_gratings_contrast': ['contrast', 'orientation'],
    'drifting_gratings_75_repeats': ['contrast', 'orientation'],
    'dot_motion': ['Speed', 'Dir']
}

Data dimensions

  • Along dimension time: spike_data, speed, pupil_data
  • Along dimension neurons: spike_data
  • Along dimension trials: spike_data, speed, pupil_data, presentations

Data

Main data files

Additional data files

Data size

  • data/01_sessions_presentations/: 11.99 GB
  • data/02_stimulus_types/: 18.82 GB
  • data/03_subset_natural_visual/: 3.19 GB
  • data/presentations/: 827.55 MB
  • data/combined_statistics.csv: 250.69 KB
  • data/sessions.csv: 13.19 KB
  • data/units.csv: 21.65 MB

Actions taken on data

Combined stimulus types

  • natural_movie_one_more_repeats and natural_movie_one

Not combined stimulus types

  • drifting_gratings and drifting_gratings_75_repeats,
    because they have different set of stimulus parameters (brain_observatory_1.1 vs functional_connectivity)
  • drifting_gratings_contrast and drifting_gratings_75_repeats (functional_connectivity),
    because they have different duration

Filtered out stimulus types

  • spontaneous
  • shuffled movies
  • 'invalid_presentation'

Filtered out stimulus presentations

  • -1 frame values in natural scenes: no stimulus shown
  • 'null' values in gratings and dot_motion: some stimulus presentations have 'null' values in:
    • drifting_gratings
    • static_gratings
    • dot_motion Those are blank trials with no stimulus presented.
  • Presentations with irregular durations:
    • For each stimulus type, compute the median duration of all presentations
    • Filter out presentations with durations deviating from the median by more than:
      • 0.01 s for natural movies
      • 0.001 s for other types

Summary of filtered presentations

gabors: 211361 - 211282 = 79 (0.04%)
flashes: 8696 - 8694 = 2 (0.02%)
drifting_gratings: 20146 - 20129 = 17 (0.08%)
natural_movie_three: 320 - 314 = 6 (1.88%)
natural_movie_one: 640 - 628 = 12 (1.88%)
static_gratings: 191360 - 191336 = 24 (0.01%)
natural_scenes: 190362 - 190323 = 39 (0.02%)
drifting_gratings_contrast: 18360 - 18355 = 5 (0.03%)
natural_movie_one_more_repeats: 1559 - 1551 = 8 (0.51%)
drifting_gratings_75_repeats: 15600 - 15591 = 9 (0.06%)
dot_motion: 11070 - 11031 = 39 (0.35%)
docs/presentations_filtered_summary.csv

Actual median durations used for filtering

gabors: 0.2502 s
flashes: 0.2502 s
drifting_gratings: 2.0017 s
natural_movie_three: 120.1003 s
natural_movie_one: 30.0251 s
static_gratings: 0.2502 s
natural_scenes: 0.2502 s
drifting_gratings_contrast: 0.5004 s
natural_movie_one_more_repeats: 30.0251 s
drifting_gratings_75_repeats: 2.0017 s
dot_motion: 1.0008 s
docs/presentations_median_durations.csv

Applied data processing

  • Bin and binarize spiking data into 20 ms bins, including longest stimulus presentations
  • Only count spikes within each presentation's duration (for movies)
  • Align times to stimulus onset
  • Label bins by center time
  • Align running speed to spiking bins by linear interpolation
  • Align all pupil data variables to spiking bins by linear interpolation

Notebook

Notebook contains code for accessing row Allen SDK data and exploring the processed data.
Loading allen sdk data requires allensdk-env conda environment.
Loading processed data requires xarray-env conda environment.
(See Environments section below)

Plots

Scripts

access_stimulus_structure.py

- Access .h5 files by stimulus type, brain structure, data type (spikes or speed)
- Save to given output path as .npy files, one per session
- Dimensions of saved arrays:
    - Spikes: neurons x presentations x time
    - Speed: presentations x time

Usage as Python module

from code.utils.access_stimulus_structure import access_stimulus_structure

access_stimulus_structure(
    stimulus,           # stimulus type to access
    structure,          # brain structure
    out_path,           # output directory for per-session .npy files
    data="spike_data",  # "spike_data" (default) or "speed"
)

Example:

access_stimulus_structure(
    stimulus="natural_movie_one",
    structure="VISp",
    out_path="data/npy_out",
    data="spike_data",
)

Usage as command line script

python code/utils/access_stimulus_structure.py \
    <stimulus> <structure> <out_path> [--data {spike_data,speed}]

Example:

python code/utils/access_stimulus_structure.py natural_movie_one VISp data/npy_out --data spike_data

cmd commands

Data download

datalad clone https://github.com/Mlynarski-Group/Allen_visual_neuropixel_data_binned
OR
git clone https://github.com/Mlynarski-Group/Allen_visual_neuropixel_data_binned

cd Allen_visual_neuropixel_data_binned

datalad get /path/to/file/or/directory
OR
git annex get /path/to/file/or/directory

Environments

Allensdk requires python 3.9, but xarray requires latest versions (xarray > v2025.10.0) to work with DataTrees.

conda env create -f envs/allensdk_env.yml
conda env create -f envs/xarray_env.yml

New allensdk installation

conda create -n allensdk python=3.9 pip
pip install allensdk
pip install ipykernel

Running heavy code for dataset creation

nohup setsid ./code/datalad_wrapper.sh & echo $! > logs/run.pid  # Start remote job with wrapper
ps -p $(cat logs/run.pid)  # Check if job is running

About

Allen neuropixel spiking data, binned and formatted by presentation, time, unit. Code allows to download .npy tensors of specified stimulus type and brain structure.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors