CryoSPARC Tuning Role

This collection of roles fully tunes a CryoSPARC GPU worker node for maximum cryo-EM throughput - It is idempotent — safe to re-run after kernel updates,hardware changes, or configuration drift.

The roles are namespaced under roles/component/cryosparc_tune/ and referenced in the playbook as cryosparc_tune/<subrole>. Each sub-role is independently tagged so individual components can be applied without running the full playbook.

Developed on:

SuperMicro AS-4125GS-TNRT2
Rocky Linux 9.5
8x NVIDIA L40S
2.2TiB RAM
27.9T - (4x) NVMe RAID0

Purpose: Dedicated CryoSPARC worker node for cryo-EM processing

"It ain't nuttin' but a G thing, bay-bay."

Directory Structure

|__ cryosparc_tune/
   |__ group_vars/
      │__ cryosparc_workers.yml   # ALL tunable parameters (edit here)
   |__ cryosparc_tune.yml      # Main Ansible playbook
   |__ roles/
      |__ nvidia_gpu/         # GPU persistence, clocks, power limits
         |__ handlers
         |__ tasks
      |__ nvme_tuning/        # I/O scheduler, queue depth, udev rules
         |__ handlers
         |__ tasks
         |__ templates
            |__ 60-nvme-scheduler.rules.j2
      |__ kernel_tuning/      # sysctl, hugepages, GRUB cmdline, tmpfiles
         |__ handlers
         |__ tasks
         |__ templates
            |__ 90-cryosparc.conf.j2
      |__ cpu_tuning/         # tuned profile, CPU governor, NUMA
         |__ tasks
      |__ cryosparc_prep/     # /scratch cache dir, mount verification, guidance
         |__ tasks

Quick Start

# Full run
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml

# Dry run (check + diff, no changes)
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --check --diff

# Single role by tag
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags nvidia
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags nvme
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags kernel
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags cpu
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags cryosparc

# Verbose output for debugging
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml -vv 2>&1 | tee ansible_tune.log

After the first run, a reboot is required for GRUB cmdline changes (1GiB hugepages, transparent_hugepage=never) to take effect. Subsequent runs are fully online with no reboot needed unless you change hugepages_1g_count.

Role Reference

`nvidia_gpu`

Tag: nvidia

Configures all 8 L40S GPUs for sustained CryoSPARC workloads.

Task	What it does
Verify `nvidia-smi`	Fails fast if driver is missing
`nvidia-persistenced`	Enables persistence mode service — eliminates GPU cold-start latency on first CUDA call
Persistence mode	`nvidia-smi -pm 1` — all GPUs
Compute mode	`DEFAULT` (mode 0) — allows multiple processes per GPU, required for CryoSPARC multi-job scheduling
Power limit	Set to `{{ gpu_power_limit_watts }}W` (default: 350W) on each GPU
Clock lock	`--lock-gpu-clocks={{ gpu_clock_min_mhz }},{{ gpu_clock_max_mhz }}` (default: 1350–2520 MHz) — eliminates frequency throttling under sustained load
ECC	Optional disable via `gpu_disable_ecc: true` — recovers ~2.7 GiB VRAM per GPU but requires reboot
`nvidia-fabricmanager`	Checked via `systemctl list-unit-files` (not present on PCIe-only L40S — safe no-op)
GPU inventory	Prints index, name, VRAM, persistence, power limit, and current clocks to Ansible output

Handlers:

reboot required for ecc change — fires if ECC state was changed, prints a reminder that reboot is needed

`nvme_tuning`

Tag: nvme

Tunes the I/O stack for the NVMe RAID0 array (md0) that backs all LVs.

Task	What it does
udev rules	Deploys `/etc/udev/rules.d/60-nvme-scheduler.rules` — sets scheduler, queue depth, and read-ahead on match
NVMe scheduler	`none` — bypasses the kernel I/O scheduler entirely; NVMe drives have their own internal queuing
NVMe queue depth	`nr_requests=1024` — allows deep pipelining for large sequential I/O
NVMe read-ahead	`read_ahead_kb=2048` — 2 MiB read-ahead for large particle stack access patterns
SATA scheduler	`mq-deadline` on `sda`/`sdb` — appropriate for rotational or slower flash
udev reload	`udevadm control --reload-rules && udevadm trigger`

Template variables:

nvme_scheduler: "none"
nvme_nr_requests: 1024
nvme_read_ahead_kb: 2048
sata_scheduler: "mq-deadline"

`kernel_tuning`

Tag: kernel

The most complex role — handles sysctl, hugepages (both 2MiB runtime and 1GiB GRUB cmdline), THP disable, and ensures tmp.mount is never masked.

sysctl (`/etc/sysctl.d/90-cryosparc.conf`)

Key	Value	Reason
`vm.swappiness`	5	2.2 TiB RAM — swap should be a last resort only
`vm.dirty_ratio`	5	~110 GiB dirty limit before writeback — sustained NVMe throughput
`vm.dirty_background_ratio`	2	Background writeback starts at ~44 GiB
`vm.dirty_writeback_centisecs`	100	Writeback every 1s (default 5s)
`vm.dirty_expire_centisecs`	3000	Expire dirty pages after 30s
`vm.max_map_count`	16777216	Critical for CUDA — 8 GPUs × many VMA regions per context. Default 65536 causes CUDA launch failures
`vm.overcommit_memory`	1	CUDA and cryo-EM allocate large virtual address maps
`vm.nr_hugepages`	131072	256 GiB of 2MiB hugepages — also set at runtime
`vm.nr_overcommit_hugepages`	32768	Burst headroom for hugepage demand
`kernel.shmmax`	1073741824000	~1 TiB max shared memory segment (CUDA IPC)
`kernel.shmall`	268435456	Total shared memory pages
`net.core.rmem_max` / `wmem_max`	134217728	128 MiB socket buffers for GPFS/NFS throughput
`net.ipv4.tcp_congestion_control`	bbr	Better throughput on high-bandwidth links
`fs.inotify.max_user_watches`	1048576	CryoSPARC watches job directories continuously
`fs.file-max`	2097152	System-wide file descriptor limit

Note: kernel.sched_min_granularity_ns and kernel.sched_wakeup_granularity_ns are not available on Rocky 9's default kernel (requires CONFIG_SCHED_DEBUG, excluded from production RHEL kernels). These were removed from the template. Scheduler tuning is handled instead by the tuned throughput-performance profile.

Hugepages

2MiB hugepages are allocated at runtime via ansible.posix.sysctl — no reboot needed. A warning is printed if the system cannot allocate the full count (fragmented memory).

1GiB hugepages are written to GRUB_CMDLINE_LINUX as:

hugepagesz=1G hugepages=64 hugepagesz=2M hugepages=131072 transparent_hugepage=never

This requires one reboot to take effect. The task is idempotent — it strips any existing hugepage/THP tokens before appending, so re-runs do not duplicate kernel args.

THP (Transparent Hugepages) is disabled both at runtime (/sys/kernel/mm/transparent_hugepage/enabled) and persisted via /etc/rc.d/rc.local. THP causes latency spikes and memory bloat under cryo-EM workloads — always set to never.

`tmp.mount` — Critical Warning

DO NOT mask tmp.mount on this system.

This role explicitly unmasks tmp.mount. Masking it breaks PrivateTmp namespace sandboxing used by dbus-broker, journald, and other core systemd services — even when /tmp is backed by a dedicated LV via fstab. The fstab entry is sufficient; tmp.mount does not conflict with it.

Masking tmp.mount caused a complete boot failure on this system: dbus-broker exited with status=226/NAMESPACE → NetworkManager failed → no network → no SSH. See Known Issues.

Handlers:

apply sysctl — runs sysctl --system
rebuild grub — runs grub2-mkconfig -o /boot/grub2/grub.cfg

`cpu_tuning`

Tag: cpu

Task	What it does
Install `kernel-tools`	Provides `cpupower`
Install `tuned`	System tuning daemon
`tuned` profile	Set to `throughput-performance` — disables power saving, maximises CPU throughput, handles scheduler tuning that sysctl keys cannot
CPU governor	`performance` via `cpupower frequency-set` — locks all cores to max frequency
`numactl`	Installed and NUMA topology printed to output for lane assignment reference

NUMA topology (L40S on this server):

NUMA Node	GPU Indices	PCIe Buses
Node 0	GPUs 0–3	06, 07, 46, 47
Node 1	GPUs 4–7	87, C3, C4, C5

Configure two CryoSPARC worker lanes aligned to these NUMA nodes for best memory locality. Verify with nvidia-smi topo -m.

`cryosparc_prep`

Tag: cryosparc

Final readiness checks and CryoSPARC-specific configuration.

Task	What it does
Verify `/scratch` mounted	Fails with helpful message if not — confirms storage script ran
Check `noatime`	Warns if `/scratch` is missing the `noatime` mount option
Create cache dir	`{{ cryosparc_cache_dir }}` (default: `/scratch/cryosparc_cache`)
Set ownership	`chown` to `{{ cryosparc_user }}` if `cryosparc_user_manage_ownership: true`
`restorecon`	Restores SELinux context on the cache directory
Mount summary	Prints `df -hT` for `/home /tmp /var/tmp /scratch`
Config instructions	Prints the exact `cryosparcw connect` command with correct path and quota

Variables

All variables live in group_vars/cryosparc_workers.yml. Edit there — never hardcode values in tasks.

GPU

Variable	Default	Description
`gpu_count`	`8`	Number of GPUs
`gpu_model`	`L40S`	Used in debug output
`gpu_power_limit_watts`	`350`	Per-GPU power cap (max for L40S is 350W)
`gpu_clock_min_mhz`	`1350`	Minimum locked graphics clock
`gpu_clock_max_mhz`	`2520`	Maximum locked graphics clock (boost clock)
`gpu_disable_ecc`	`false`	Set `true` to recover ~2.7 GiB VRAM/GPU (reboot required)

Hugepages

Variable	Default	Description
`hugepages_2m_count`	`131072`	2MiB hugepages = 256 GiB (runtime, no reboot)
`hugepages_1g_count`	`64`	1GiB hugepages = 64 GiB (GRUB cmdline, reboot required)
`transparent_hugepage`	`never`	THP setting — never change from `never` on cryo-EM nodes

CryoSPARC

Variable	Default	Description
`cryosparc_cache_dir`	`/scratch/cryosparc_cache`	SSD cache path
`cryosparc_cache_quota_mb`	`409600`	Cache quota in MB (400 GiB — leaves 100 GiB headroom)
`cryosparc_user`	`svc_rmlcryoprd1`	OS service account that owns the cache dir
`cryosparc_user_manage_ownership`	`true`	Set `false` if user doesn't exist yet

NVMe I/O

Variable	Default	Description
`nvme_scheduler`	`none`	I/O scheduler for NVMe devices
`nvme_nr_requests`	`1024`	Queue depth per NVMe device
`nvme_read_ahead_kb`	`2048`	Read-ahead in KiB
`sata_scheduler`	`mq-deadline`	Scheduler for SATA devices

CPU

Variable	Default	Description
`tuned_profile`	`throughput-performance`	tuned profile
`cpu_governor`	`performance`	cpupower frequency governor

Kernel sysctl

Variable	Default	Notes
`vm_swappiness`	`5`
`vm_dirty_ratio`	`5`
`vm_dirty_background_ratio`	`2`
`vm_dirty_writeback_centisecs`	`100`
`vm_dirty_expire_centisecs`	`3000`
`vm_max_map_count`	`16777216`	Must be ≥ 16M for 8-GPU CUDA workloads
`vm_overcommit_memory`	`1`
`fs_inotify_max_user_watches`	`1048576`
`fs_inotify_max_user_instances`	`4096`
`fs_file_max`	`2097152`
`kernel_shmmax`	`1073741824000`	~1 TiB
`kernel_shmall`	`268435456`
`net_core_rmem_max`	`134217728`	128 MiB
`net_core_wmem_max`	`134217728`	128 MiB

Optimizations Applied

GPU

Persistence mode eliminates the ~500ms GPU initialization delay on the first CUDA call of each job — critical for CryoSPARC's short-lived GPU processes
Clock locking prevents the GPU from throttling during compute-bound phases and eliminates frequency ramp-up latency between jobs
DEFAULT compute mode allows CryoSPARC to schedule multiple jobs to the same GPU simultaneously, maximising utilisation during mixed workloads

Memory

256 GiB of 2MiB hugepages pre-allocated at runtime for CUDA pinned memory and large array allocations common in CTF estimation and 2D classification
64 GiB of 1GiB hugepages in the kernel cmdline for very large contiguous allocations (3D refinement volumes)
THP disabled — transparent hugepages cause unpredictable latency spikes when the kernel attempts to collapse/split pages during cryo-EM I/O bursts
vm.max_map_count=16M — the default of 65536 is insufficient for 8 GPUs under load; CUDA requires hundreds of VMA regions per context and will fail with CUDA_ERROR_OUT_OF_MEMORY or launch errors without this

Storage

NVMe scheduler none — modern NVMe controllers implement their own internal command queuing (NCQ). Inserting a kernel I/O scheduler adds latency with no benefit
Queue depth 1024 — allows the NVMe controller to reorder and coalesce deeply pipelined requests from concurrent CryoSPARC workers
Read-ahead 2048K — aligns with CryoSPARC's large sequential access pattern when loading particle stacks and micrographs
noatime,nodiratime on /scratch — eliminates inode update writes on every SSD cache read

CPU / OS

throughput-performance tuned profile — disables CPU power saving states (C-states), sets CPU frequency scaling to max, and tunes the kernel scheduler for throughput over latency
performance CPU governor — all cores run at maximum frequency; avoids frequency ramp latency when CryoSPARC spawns CPU-side preprocessing workers
BBR congestion control — better throughput for GPFS/NFS data ingestion from bigsky on high-bandwidth links
Large dirty ratios — allows up to ~110 GiB of dirty write cache before kernel writeback, sustaining NVMe write throughput during movie stack imports

Known Issues & Fixes

These bugs were encountered and resolved during initial deployment. They are documented here so future administrators understand the design decisions.

1. `tmp.mount` masking → boot failure

Symptom: After playbook run + reboot, system came up with no network. dbus-broker failing: status=226/NAMESPACE. NetworkManager dependency failed. SSH inaccessible.

Root cause: Masking tmp.mount prevents systemd from setting up the /run/systemd/unit-root private mount namespace that dbus-broker (and many other services with PrivateTmp=yes) requires. The LV-backed /tmp via fstab is completely unrelated — systemd respects fstab mounts and does not overlay them with tmpfs. Masking tmp.mount was unnecessary and catastrophic.

Fix: kernel_tuning now runs systemctl unmask tmp.mount instead. The fstab entry alone is sufficient.

Recovery path used:

Boot to init=/bin/bash selinux=0 via GRUB editor
passwd root + enable PermitRootLogin yes in sshd_config
Reboot → log in as root via iKVM console
systemctl unmask tmp.mount && systemctl start dbus-broker
systemctl start NetworkManager → SSH restored
Fix fstab trailing commas, reboot cleanly

2. `nvidia-fabricmanager` check hard-failed

Symptom: ansible.builtin.systemd with ignore_errors: true still caused unreliable when: condition evaluation when the unit didn't exist.

Fix: Replaced with shell check:

ansible.builtin.shell: >
  systemctl list-unit-files nvidia-fabricmanager.service --no-legend
  | grep -q nvidia-fabricmanager
register: fabricmanager_check
failed_when: false
# then: when: fabricmanager_check.rc == 0

3. sysctl keys not available on Rocky 9

Symptom: sysctl -p failed on kernel.sched_min_granularity_ns and kernel.sched_wakeup_granularity_ns.

Root cause: These keys require CONFIG_SCHED_DEBUG which is excluded from production RHEL/Rocky kernels.

Fix: Removed from 90-cryosparc.conf.j2. Added --ignore flag to sysctl -p invocation as defence-in-depth.

4. GRUB handler wrote to wrong path

Symptom: grub2-mkconfig wrote to /boot/efi/EFI/rocky/grub.cfg (the EFI wrapper file) — Rocky 9 ignores this file; it reads /boot/grub2/grub.cfg.

Fix: Hardcoded handler to grub2-mkconfig -o /boot/grub2/grub.cfg.

5. `cryosparc_user` defaulted to nonexistent user

Symptom: chown failed: failed to look up user cryosparc

Fix: group_vars/cryosparc_workers.yml: cryosparc_user: "svc_rmlcryoprd1"

6. SELinux autorelabel after storage reconfiguration

Symptom: touch /.autorelabel (added by 01_reconfig_storage.sh) triggered a full filesystem relabel on the first post-script reboot. New /tmp and /var/tmp LV mounts received incorrect SELinux contexts, which compounded the dbus-broker failure.

Fix: Remove /.autorelabel before rebooting after storage changes, and set SELINUX=permissive temporarily. Run restorecon -Rv /tmp /var/tmp after confirming the system boots cleanly, then restore SELINUX=enforcing.

Post-Run Verification

Run after the playbook (and reboot, if GRUB was updated):

# Storage mounts
df -hT /home /tmp /var/tmp /scratch

# GPU — persistence, clocks, power
nvidia-smi --query-gpu=index,persistence_mode,clocks.current.graphics,power.limit \
  --format=csv

# Hugepages
grep -E 'HugePages|Hugepagesize' /proc/meminfo

# THP — should show [never]
cat /sys/kernel/mm/transparent_hugepage/enabled

# NVMe scheduler — should show [none]
for d in /sys/block/nvme*n*; do
  echo "$(basename $d): $(cat $d/queue/scheduler)"
done

# tuned profile
tuned-adm active

# CPU governor — should show 'performance' for all CPUs
cpupower frequency-info -p | grep governor

# Key services
systemctl status dbus-broker NetworkManager sshd nvidia-persistenced --no-pager

# sysctl spot check
sysctl vm.max_map_count vm.swappiness vm.nr_hugepages

Reboot Requirements

Change	Reboot needed?
First run (GRUB cmdline updated)	Yes — for 1GiB hugepages and `transparent_hugepage=never`
Subsequent runs (no GRUB change)	No
`hugepages_1g_count` changed	Yes
`gpu_disable_ecc: true`	Yes
All other variable changes	No

Extras

[admin@cryo-worker ~]$ for d in /sys/block/nvme*n*; do echo "$(basename $d): $(cat $d/queue/scheduler)"; done
23:46:04

nvme0n1: [none] mq-deadline kyber bfq
nvme1n1: [none] mq-deadline kyber bfq
nvme2n1: [none] mq-deadline kyber bfq
nvme3n1: [none] mq-deadline kyber bfq

[admin@cryo-worker ~]$ df -hT /home /tmp /var/tmp /scratch
23:43:37

Filesystem                  Type  Size  Used Avail Use% Mounted on
/dev/mapper/system-home     xfs   2.0T   15G  2.0T   1% /home
/dev/mapper/system-tmp      xfs   200G  1.5G  199G   1% /tmp
/dev/mapper/system-var_tmp  xfs   200G  1.5G  199G   1% /var/tmp
/dev/mapper/system-lscratch xfs   500G  3.6G  497G   1% /scratch

[admin@cryo-worker ~]$ nvidia-smi --query-gpu=index,name,memory.total,memory.free --format=csv
23:31:57

index, name, memory.total [MiB], memory.free [MiB]
0, NVIDIA L40S, 46068 MiB, 45469 MiB
1, NVIDIA L40S, 46068 MiB, 45469 MiB
2, NVIDIA L40S, 46068 MiB, 45469 MiB
3, NVIDIA L40S, 46068 MiB, 45469 MiB
4, NVIDIA L40S, 46068 MiB, 45469 MiB
5, NVIDIA L40S, 46068 MiB, 45469 MiB
6, NVIDIA L40S, 46068 MiB, 45469 MiB
7, NVIDIA L40S, 46068 MiB, 45469 MiB

[admin@cryo-worker ~]$ nvidia-smi topo -p2p r
23:31:17

        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7
 GPU0   X       OK      OK      OK      OK      OK      OK      OK
 GPU1   OK      X       OK      OK      OK      OK      OK      OK
 GPU2   OK      OK      X       OK      OK      OK      OK      OK
 GPU3   OK      OK      OK      X       OK      OK      OK      OK
 GPU4   OK      OK      OK      OK      X       OK      OK      OK
 GPU5   OK      OK      OK      OK      OK      X       OK      OK
 GPU6   OK      OK      OK      OK      OK      OK      X       OK
 GPU7   OK      OK      OK      OK      OK      OK      OK      X

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown

[admin@cryo-worker ~]$ nvidia-smi topo -m
13:50:58

        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PIX     NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU1    PIX      X      NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU2    NODE    NODE     X      PIX     SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU3    NODE    NODE    PIX      X      SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU4    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE    PIX     PIX     64-127  1               N/A
GPU5    SYS     SYS     SYS     SYS     NODE     X      PIX     PIX     NODE    NODE    64-127  1               N/A
GPU6    SYS     SYS     SYS     SYS     NODE    PIX      X      PIX     NODE    NODE    64-127  1               N/A
GPU7    SYS     SYS     SYS     SYS     NODE    PIX     PIX      X      NODE    NODE    64-127  1               N/A
NIC0    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE     X      PIX
NIC1    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1

[admin@cryo-worker ~]$ lsblk
14:54:20

NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda                          8:0    0 894.3G  0 disk
sdb                          8:16   0 894.3G  0 disk
nvme2n1                    259:0    0    14T  0 disk
nvme1n1                    259:1    0    14T  0 disk
nvme3n1                    259:2    0    14T  0 disk
└─nvme3n1p1                259:3    0    14T  0 part
  └─md0                      9:0    0  27.9T  0 raid0
    ├─system-root          253:0    0    50G  0 lvm   /
    ├─system-swap          253:1    0    16G  0 lvm   [SWAP]
    ├─system-var_crash     253:2    0    50G  0 lvm   /var/crash
    ├─system-var_log_audit 253:3    0   100G  0 lvm   /var/log/audit
    ├─system-var_log       253:4    0   100G  0 lvm   /var/log
    ├─system-var           253:5    0    50G  0 lvm   /var
    ├─system-home          253:6    0     2T  0 lvm   /home
    ├─system-lscratch      253:7    0   500G  0 lvm   /scratch
    ├─system-tmp           253:8    0   200G  0 lvm   /tmp
    └─system-var_tmp       253:9    0   200G  0 lvm   /var/tmp
nvme0n1                    259:4    0    14T  0 disk
├─nvme0n1p1                259:5    0   600M  0 part  /boot/efi
├─nvme0n1p2                259:6    0     2G  0 part  /boot
└─nvme0n1p3                259:7    0    14T  0 part
  └─md0                      9:0    0  27.9T  0 raid0
    ├─system-root          253:0    0    50G  0 lvm   /
    ├─system-swap          253:1    0    16G  0 lvm   [SWAP]
    ├─system-var_crash     253:2    0    50G  0 lvm   /var/crash
    ├─system-var_log_audit 253:3    0   100G  0 lvm   /var/log/audit
    ├─system-var_log       253:4    0   100G  0 lvm   /var/log
    ├─system-var           253:5    0    50G  0 lvm   /var
    ├─system-home          253:6    0     2T  0 lvm   /home
    ├─system-lscratch      253:7    0   500G  0 lvm   /scratch
    ├─system-tmp           253:8    0   200G  0 lvm   /tmp
    └─system-var_tmp       253:9    0   200G  0 lvm   /var/tmp

[admin@cryo-worker ~]$ grep -E 'HugePages|Hugepagesize' /proc/meminfo
23:44:30

AnonHugePages:      2048 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   131072
HugePages_Free:    131072
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

TASK [Playbook complete — summary] **********************************************************************************************************************************************************************************************************
ok: [cryo-worker.niaid.nih.gov] => {
    "msg": [
        "================================================================",
        " Tuning complete: cryo-worker.niaid.nih.gov",
        "================================================================",
        " Applied immediately (no reboot needed):",
        "   nvidia-persistenced, clock lock, power limits",
        "   NVMe scheduler=none, nr_requests=1024",
        "   2MiB hugepages allocated, THP disabled",
        "   sysctl tuning (vm, net, fs, kernel)",
        "   tuned profile=throughput-performance, governor=performance",
        "   /scratch/cryosparc_cache created and permissioned",
        "",
        " Requires reboot:",
        "   1GiB hugepages (hugepagesz=1G in GRUB cmdline)",
        "   ECC change (only if gpu_disable_ecc: true)",
        "",
        " When ready: sudo reboot",
        "================================================================"
    ]
}

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
cryo-worker.niaid.nih.gov    : ok=55   changed=8    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
cpu_tuning/tasks		cpu_tuning/tasks
cryosparc_prep/tasks		cryosparc_prep/tasks
kernel_tuning		kernel_tuning
nvidia_gpu		nvidia_gpu
nvme_tuning		nvme_tuning
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CryoSPARC Tuning Role

Table of Contents

Directory Structure

Quick Start

Role Reference

nvidia_gpu

nvme_tuning

kernel_tuning

sysctl (/etc/sysctl.d/90-cryosparc.conf)

Hugepages

tmp.mount — Critical Warning

cpu_tuning

cryosparc_prep

Variables

GPU

Hugepages

CryoSPARC

NVMe I/O

CPU

Kernel sysctl

Optimizations Applied

GPU

Memory

Storage

CPU / OS

Known Issues & Fixes

1. tmp.mount masking → boot failure

2. nvidia-fabricmanager check hard-failed

3. sysctl keys not available on Rocky 9

4. GRUB handler wrote to wrong path

5. cryosparc_user defaulted to nonexistent user

6. SELinux autorelabel after storage reconfiguration

Post-Run Verification

Reboot Requirements

Extras

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`nvidia_gpu`

`nvme_tuning`

`kernel_tuning`

sysctl (`/etc/sysctl.d/90-cryosparc.conf`)

`tmp.mount` — Critical Warning

`cpu_tuning`

`cryosparc_prep`

1. `tmp.mount` masking → boot failure

2. `nvidia-fabricmanager` check hard-failed

5. `cryosparc_user` defaulted to nonexistent user

Packages