Skip to content

Excessive memory use in 1.0.0-rc10 vs. 1.0.0-rc8? #836

@fcmeyer

Description

@fcmeyer

Hello,

I have been testing different versions of fMRIPREP on the same two subjects in the Vanderbilt HPC using Singularity. I told SLURM to allocate 24GB to the job.

Here's the SBATCH script we used for one subject:

#!/bin/tcsh
#SBATCH --nodes=1    # comments allowed
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=03:30:00
#SBATCH --mem=24G
#SBATCH --output=fpreprc10_226854.log

set OMP_NUM_THREADS = $SLURM_CPUS_PER_TASK
module load GCC Singularity
singularity run poldracklab_fmriprep_1.0.0-rc10-2017-11-10-49e327b1b660.img \
  /data/h_zald_lab/Fran/tts_test/BIDS_TTS_Test \
  /data/h_zald_lab/Fran/tts_test/out_fmriprep_RC10 \
  participant --participant_label 226854 \
  --n_cpus 8 --mem_mb 24000 --no-freesurfer \
  -w /data/h_zald_lab/Fran/tts_test/work_226854_RC10 \
  -t mid

The version for RC8 was identical, except we changed the image to RC8 and updated the output names to have RC8 instead of RC10 for later comparison. We did this for two subjects total.

While RC8 ran with no hiccups, RC10 had two issues. First, the repetitive warning mentioned in my Neurostars post that kept spamming throughout the log:

171110-02:11:12,400 interface WARNING:
Affines of input and reference images do not match, CopyXForm will probably make the input image useless.

But more concerning, SLURM killed the pipeline for both of our subjects due to excessive memory usage (I am including only the tails because the log files are super long, given the insane amount of warnings we got):

171116-17:29:39,944 niworkflows INFO:
	 Generating report for aCompCor. file "/data/h_zald_lab/Fran/tts_test/work_108992_RC10/fmriprep_wf/single_subject_108992_wf/func_preproc_task_mid_run_01_wf/bold_bold_trans_wf/merge/vol0000_xform-00000_merged.nii.gz", mask "/data/h_zald_lab/Fran/tts_test/work_108992_RC10/fmriprep_wf/single_subject_108992_wf/func_preproc_task_mid_run_01_wf/bold_confounds_wf/acc_tfm/highres001_BrainExtractionBrain_prob_0_tpmsum_roi_trans_boldmsk.nii.gz"
171116-17:29:53,729 niworkflows INFO:
	 Successfully created report (/data/h_zald_lab/Fran/tts_test/work_108992_RC10/fmriprep_wf/single_subject_108992_wf/func_preproc_task_mid_run_01_wf/bold_confounds_wf/acompcor/report.html)
171116-17:29:54,76 interface WARNING:
	 Affines of input and reference images do not match, CopyXForm will probably make the input image useless.
171116-17:59:02,201 niworkflows INFO:
	 Successful spatial normalization (retry #0).
171116-17:59:02,203 niworkflows INFO:
	 Report - setting fixed (/data/h_zald_lab/Fran/tts_test/work_108992_RC10/fmriprep_wf/single_subject_108992_wf/anat_preproc_wf/t1_2_mni/fixed_masked.nii.gz) and moving (/data/h_zald_lab/Fran/tts_test/work_108992_RC10/fmriprep_wf/single_subject_108992_wf/anat_preproc_wf/t1_2_mni/ants_t1_to_mni_Warped.nii.gz) images
171116-17:59:02,203 niworkflows INFO:
	 Generating visual report
171116-17:59:23,657 niworkflows INFO:
	 Successfully created report (/data/h_zald_lab/Fran/tts_test/work_108992_RC10/fmriprep_wf/single_subject_108992_wf/anat_preproc_wf/t1_2_mni/report.svg)
slurmstepd: error: Job 21317166 exceeded memory limit (26401148 > 25165824), being killed
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: *** JOB 21317166 ON vmp1312 CANCELLED AT 2017-11-16T12:05:08 ***
	 Generating report for aCompCor. file "/data/h_zald_lab/Fran/tts_test/work_226854_RC10/fmriprep_wf/single_subject_226854_wf/func_preproc_task_mid_run_01_wf/bold_bold_trans_wf/merge/vol0000_xform-00000_merged.nii.gz", mask "/data/h_zald_lab/Fran/tts_test/work_226854_RC10/fmriprep_wf/single_subject_226854_wf/func_preproc_task_mid_run_01_wf/bold_confounds_wf/acc_tfm/highres001_BrainExtractionBrain_prob_0_tpmsum_roi_trans_boldmsk.nii.gz"
171116-17:54:16,39 niworkflows INFO:
	 Successfully created report (/data/h_zald_lab/Fran/tts_test/work_226854_RC10/fmriprep_wf/single_subject_226854_wf/func_preproc_task_mid_run_01_wf/bold_confounds_wf/acompcor/report.html)
171116-17:54:17,378 interface WARNING:
	 Affines of input and reference images do not match, CopyXForm will probably make the input image useless.
171116-18:34:26,585 niworkflows INFO:
	 Successful spatial normalization (retry #0).
171116-18:34:26,590 niworkflows INFO:
	 Report - setting fixed (/data/h_zald_lab/Fran/tts_test/work_226854_RC10/fmriprep_wf/single_subject_226854_wf/anat_preproc_wf/t1_2_mni/fixed_masked.nii.gz) and moving (/data/h_zald_lab/Fran/tts_test/work_226854_RC10/fmriprep_wf/single_subject_226854_wf/anat_preproc_wf/t1_2_mni/ants_t1_to_mni_Warped.nii.gz) images
171116-18:34:26,590 niworkflows INFO:
	 Generating visual report
171116-18:35:00,936 niworkflows INFO:
	 Successfully created report (/data/h_zald_lab/Fran/tts_test/work_226854_RC10/fmriprep_wf/single_subject_226854_wf/anat_preproc_wf/t1_2_mni/report.svg)
slurmstepd: error: Job 21317317 exceeded memory limit (26395460 > 25165824), being killed
slurmstepd: error: *** JOB 21317317 ON vmp452 CANCELLED AT 2017-11-16T12:42:27 ***

I am not sure whether the minimum memory requirements have increased since rc8, or whether this is a bug causing excessive memory usage / ignoring the limits specified in the call. But, I figured I'd bring it up in case others are experiencing this problem.

Thank you so much for developing this, I really like this tool!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions