Skip to content

fourzerosix/cryosparc-tune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CryoSPARC Tuning Role

This collection of roles fully tunes a CryoSPARC GPU worker node for maximum cryo-EM throughput - It is idempotent — safe to re-run after kernel updates,hardware changes, or configuration drift.

The roles are namespaced under roles/component/cryosparc_tune/ and referenced in the playbook as cryosparc_tune/<subrole>. Each sub-role is independently tagged so individual components can be applied without running the full playbook.

Developed on:

  • SuperMicro AS-4125GS-TNRT2
  • Rocky Linux 9.5
  • 8x NVIDIA L40S
  • 2.2TiB RAM
  • 27.9T - (4x) NVMe RAID0

Purpose: Dedicated CryoSPARC worker node for cryo-EM processing

"It ain't nuttin' but a G thing, bay-bay."


Table of Contents


Directory Structure

|__ cryosparc_tune/
   |__ group_vars/
      │__ cryosparc_workers.yml   # ALL tunable parameters (edit here)
   |__ cryosparc_tune.yml      # Main Ansible playbook
   |__ roles/
      |__ nvidia_gpu/         # GPU persistence, clocks, power limits
         |__ handlers
         |__ tasks
      |__ nvme_tuning/        # I/O scheduler, queue depth, udev rules
         |__ handlers
         |__ tasks
         |__ templates
            |__ 60-nvme-scheduler.rules.j2
      |__ kernel_tuning/      # sysctl, hugepages, GRUB cmdline, tmpfiles
         |__ handlers
         |__ tasks
         |__ templates
            |__ 90-cryosparc.conf.j2
      |__ cpu_tuning/         # tuned profile, CPU governor, NUMA
         |__ tasks
      |__ cryosparc_prep/     # /scratch cache dir, mount verification, guidance
         |__ tasks

Quick Start

# Full run
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml

# Dry run (check + diff, no changes)
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --check --diff

# Single role by tag
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags nvidia
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags nvme
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags kernel
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags cpu
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags cryosparc

# Verbose output for debugging
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml -vv 2>&1 | tee ansible_tune.log

After the first run, a reboot is required for GRUB cmdline changes (1GiB hugepages, transparent_hugepage=never) to take effect. Subsequent runs are fully online with no reboot needed unless you change hugepages_1g_count.


Role Reference

nvidia_gpu

Tag: nvidia

Configures all 8 L40S GPUs for sustained CryoSPARC workloads.

Task What it does
Verify nvidia-smi Fails fast if driver is missing
nvidia-persistenced Enables persistence mode service — eliminates GPU cold-start latency on first CUDA call
Persistence mode nvidia-smi -pm 1 — all GPUs
Compute mode DEFAULT (mode 0) — allows multiple processes per GPU, required for CryoSPARC multi-job scheduling
Power limit Set to {{ gpu_power_limit_watts }}W (default: 350W) on each GPU
Clock lock --lock-gpu-clocks={{ gpu_clock_min_mhz }},{{ gpu_clock_max_mhz }} (default: 1350–2520 MHz) — eliminates frequency throttling under sustained load
ECC Optional disable via gpu_disable_ecc: true — recovers ~2.7 GiB VRAM per GPU but requires reboot
nvidia-fabricmanager Checked via systemctl list-unit-files (not present on PCIe-only L40S — safe no-op)
GPU inventory Prints index, name, VRAM, persistence, power limit, and current clocks to Ansible output

Handlers:

  • reboot required for ecc change — fires if ECC state was changed, prints a reminder that reboot is needed

nvme_tuning

Tag: nvme

Tunes the I/O stack for the NVMe RAID0 array (md0) that backs all LVs.

Task What it does
udev rules Deploys /etc/udev/rules.d/60-nvme-scheduler.rules — sets scheduler, queue depth, and read-ahead on match
NVMe scheduler none — bypasses the kernel I/O scheduler entirely; NVMe drives have their own internal queuing
NVMe queue depth nr_requests=1024 — allows deep pipelining for large sequential I/O
NVMe read-ahead read_ahead_kb=2048 — 2 MiB read-ahead for large particle stack access patterns
SATA scheduler mq-deadline on sda/sdb — appropriate for rotational or slower flash
udev reload udevadm control --reload-rules && udevadm trigger

Template variables:

nvme_scheduler: "none"
nvme_nr_requests: 1024
nvme_read_ahead_kb: 2048
sata_scheduler: "mq-deadline"

kernel_tuning

Tag: kernel

The most complex role — handles sysctl, hugepages (both 2MiB runtime and 1GiB GRUB cmdline), THP disable, and ensures tmp.mount is never masked.

sysctl (/etc/sysctl.d/90-cryosparc.conf)

Key Value Reason
vm.swappiness 5 2.2 TiB RAM — swap should be a last resort only
vm.dirty_ratio 5 ~110 GiB dirty limit before writeback — sustained NVMe throughput
vm.dirty_background_ratio 2 Background writeback starts at ~44 GiB
vm.dirty_writeback_centisecs 100 Writeback every 1s (default 5s)
vm.dirty_expire_centisecs 3000 Expire dirty pages after 30s
vm.max_map_count 16777216 Critical for CUDA — 8 GPUs × many VMA regions per context. Default 65536 causes CUDA launch failures
vm.overcommit_memory 1 CUDA and cryo-EM allocate large virtual address maps
vm.nr_hugepages 131072 256 GiB of 2MiB hugepages — also set at runtime
vm.nr_overcommit_hugepages 32768 Burst headroom for hugepage demand
kernel.shmmax 1073741824000 ~1 TiB max shared memory segment (CUDA IPC)
kernel.shmall 268435456 Total shared memory pages
net.core.rmem_max / wmem_max 134217728 128 MiB socket buffers for GPFS/NFS throughput
net.ipv4.tcp_congestion_control bbr Better throughput on high-bandwidth links
fs.inotify.max_user_watches 1048576 CryoSPARC watches job directories continuously
fs.file-max 2097152 System-wide file descriptor limit

Note: kernel.sched_min_granularity_ns and kernel.sched_wakeup_granularity_ns are not available on Rocky 9's default kernel (requires CONFIG_SCHED_DEBUG, excluded from production RHEL kernels). These were removed from the template. Scheduler tuning is handled instead by the tuned throughput-performance profile.

Hugepages

2MiB hugepages are allocated at runtime via ansible.posix.sysctl — no reboot needed. A warning is printed if the system cannot allocate the full count (fragmented memory).

1GiB hugepages are written to GRUB_CMDLINE_LINUX as:

hugepagesz=1G hugepages=64 hugepagesz=2M hugepages=131072 transparent_hugepage=never

This requires one reboot to take effect. The task is idempotent — it strips any existing hugepage/THP tokens before appending, so re-runs do not duplicate kernel args.

THP (Transparent Hugepages) is disabled both at runtime (/sys/kernel/mm/transparent_hugepage/enabled) and persisted via /etc/rc.d/rc.local. THP causes latency spikes and memory bloat under cryo-EM workloads — always set to never.

tmp.mount — Critical Warning

DO NOT mask tmp.mount on this system.

This role explicitly unmasks tmp.mount. Masking it breaks PrivateTmp namespace sandboxing used by dbus-broker, journald, and other core systemd services — even when /tmp is backed by a dedicated LV via fstab. The fstab entry is sufficient; tmp.mount does not conflict with it.

Masking tmp.mount caused a complete boot failure on this system: dbus-broker exited with status=226/NAMESPACENetworkManager failed → no network → no SSH. See Known Issues.

Handlers:

  • apply sysctl — runs sysctl --system
  • rebuild grub — runs grub2-mkconfig -o /boot/grub2/grub.cfg

cpu_tuning

Tag: cpu

Task What it does
Install kernel-tools Provides cpupower
Install tuned System tuning daemon
tuned profile Set to throughput-performance — disables power saving, maximises CPU throughput, handles scheduler tuning that sysctl keys cannot
CPU governor performance via cpupower frequency-set — locks all cores to max frequency
numactl Installed and NUMA topology printed to output for lane assignment reference

NUMA topology (L40S on this server):

NUMA Node GPU Indices PCIe Buses
Node 0 GPUs 0–3 06, 07, 46, 47
Node 1 GPUs 4–7 87, C3, C4, C5

Configure two CryoSPARC worker lanes aligned to these NUMA nodes for best memory locality. Verify with nvidia-smi topo -m.


cryosparc_prep

Tag: cryosparc

Final readiness checks and CryoSPARC-specific configuration.

Task What it does
Verify /scratch mounted Fails with helpful message if not — confirms storage script ran
Check noatime Warns if /scratch is missing the noatime mount option
Create cache dir {{ cryosparc_cache_dir }} (default: /scratch/cryosparc_cache)
Set ownership chown to {{ cryosparc_user }} if cryosparc_user_manage_ownership: true
restorecon Restores SELinux context on the cache directory
Mount summary Prints df -hT for /home /tmp /var/tmp /scratch
Config instructions Prints the exact cryosparcw connect command with correct path and quota

Variables

All variables live in group_vars/cryosparc_workers.yml. Edit there — never hardcode values in tasks.

GPU

Variable Default Description
gpu_count 8 Number of GPUs
gpu_model L40S Used in debug output
gpu_power_limit_watts 350 Per-GPU power cap (max for L40S is 350W)
gpu_clock_min_mhz 1350 Minimum locked graphics clock
gpu_clock_max_mhz 2520 Maximum locked graphics clock (boost clock)
gpu_disable_ecc false Set true to recover ~2.7 GiB VRAM/GPU (reboot required)

Hugepages

Variable Default Description
hugepages_2m_count 131072 2MiB hugepages = 256 GiB (runtime, no reboot)
hugepages_1g_count 64 1GiB hugepages = 64 GiB (GRUB cmdline, reboot required)
transparent_hugepage never THP setting — never change from never on cryo-EM nodes

CryoSPARC

Variable Default Description
cryosparc_cache_dir /scratch/cryosparc_cache SSD cache path
cryosparc_cache_quota_mb 409600 Cache quota in MB (400 GiB — leaves 100 GiB headroom)
cryosparc_user svc_rmlcryoprd1 OS service account that owns the cache dir
cryosparc_user_manage_ownership true Set false if user doesn't exist yet

NVMe I/O

Variable Default Description
nvme_scheduler none I/O scheduler for NVMe devices
nvme_nr_requests 1024 Queue depth per NVMe device
nvme_read_ahead_kb 2048 Read-ahead in KiB
sata_scheduler mq-deadline Scheduler for SATA devices

CPU

Variable Default Description
tuned_profile throughput-performance tuned profile
cpu_governor performance cpupower frequency governor

Kernel sysctl

Variable Default Notes
vm_swappiness 5
vm_dirty_ratio 5
vm_dirty_background_ratio 2
vm_dirty_writeback_centisecs 100
vm_dirty_expire_centisecs 3000
vm_max_map_count 16777216 Must be ≥ 16M for 8-GPU CUDA workloads
vm_overcommit_memory 1
fs_inotify_max_user_watches 1048576
fs_inotify_max_user_instances 4096
fs_file_max 2097152
kernel_shmmax 1073741824000 ~1 TiB
kernel_shmall 268435456
net_core_rmem_max 134217728 128 MiB
net_core_wmem_max 134217728 128 MiB

Optimizations Applied

GPU

  • Persistence mode eliminates the ~500ms GPU initialization delay on the first CUDA call of each job — critical for CryoSPARC's short-lived GPU processes
  • Clock locking prevents the GPU from throttling during compute-bound phases and eliminates frequency ramp-up latency between jobs
  • DEFAULT compute mode allows CryoSPARC to schedule multiple jobs to the same GPU simultaneously, maximising utilisation during mixed workloads

Memory

  • 256 GiB of 2MiB hugepages pre-allocated at runtime for CUDA pinned memory and large array allocations common in CTF estimation and 2D classification
  • 64 GiB of 1GiB hugepages in the kernel cmdline for very large contiguous allocations (3D refinement volumes)
  • THP disabled — transparent hugepages cause unpredictable latency spikes when the kernel attempts to collapse/split pages during cryo-EM I/O bursts
  • vm.max_map_count=16M — the default of 65536 is insufficient for 8 GPUs under load; CUDA requires hundreds of VMA regions per context and will fail with CUDA_ERROR_OUT_OF_MEMORY or launch errors without this

Storage

  • NVMe scheduler none — modern NVMe controllers implement their own internal command queuing (NCQ). Inserting a kernel I/O scheduler adds latency with no benefit
  • Queue depth 1024 — allows the NVMe controller to reorder and coalesce deeply pipelined requests from concurrent CryoSPARC workers
  • Read-ahead 2048K — aligns with CryoSPARC's large sequential access pattern when loading particle stacks and micrographs
  • noatime,nodiratime on /scratch — eliminates inode update writes on every SSD cache read

CPU / OS

  • throughput-performance tuned profile — disables CPU power saving states (C-states), sets CPU frequency scaling to max, and tunes the kernel scheduler for throughput over latency
  • performance CPU governor — all cores run at maximum frequency; avoids frequency ramp latency when CryoSPARC spawns CPU-side preprocessing workers
  • BBR congestion control — better throughput for GPFS/NFS data ingestion from bigsky on high-bandwidth links
  • Large dirty ratios — allows up to ~110 GiB of dirty write cache before kernel writeback, sustaining NVMe write throughput during movie stack imports

Known Issues & Fixes

These bugs were encountered and resolved during initial deployment. They are documented here so future administrators understand the design decisions.

1. tmp.mount masking → boot failure

Symptom: After playbook run + reboot, system came up with no network. dbus-broker failing: status=226/NAMESPACE. NetworkManager dependency failed. SSH inaccessible.

Root cause: Masking tmp.mount prevents systemd from setting up the /run/systemd/unit-root private mount namespace that dbus-broker (and many other services with PrivateTmp=yes) requires. The LV-backed /tmp via fstab is completely unrelated — systemd respects fstab mounts and does not overlay them with tmpfs. Masking tmp.mount was unnecessary and catastrophic.

Fix: kernel_tuning now runs systemctl unmask tmp.mount instead. The fstab entry alone is sufficient.

Recovery path used:

  1. Boot to init=/bin/bash selinux=0 via GRUB editor
  2. passwd root + enable PermitRootLogin yes in sshd_config
  3. Reboot → log in as root via iKVM console
  4. systemctl unmask tmp.mount && systemctl start dbus-broker
  5. systemctl start NetworkManager → SSH restored
  6. Fix fstab trailing commas, reboot cleanly

2. nvidia-fabricmanager check hard-failed

Symptom: ansible.builtin.systemd with ignore_errors: true still caused unreliable when: condition evaluation when the unit didn't exist.

Fix: Replaced with shell check:

ansible.builtin.shell: >
  systemctl list-unit-files nvidia-fabricmanager.service --no-legend
  | grep -q nvidia-fabricmanager
register: fabricmanager_check
failed_when: false
# then: when: fabricmanager_check.rc == 0

3. sysctl keys not available on Rocky 9

Symptom: sysctl -p failed on kernel.sched_min_granularity_ns and kernel.sched_wakeup_granularity_ns.

Root cause: These keys require CONFIG_SCHED_DEBUG which is excluded from production RHEL/Rocky kernels.

Fix: Removed from 90-cryosparc.conf.j2. Added --ignore flag to sysctl -p invocation as defence-in-depth.

4. GRUB handler wrote to wrong path

Symptom: grub2-mkconfig wrote to /boot/efi/EFI/rocky/grub.cfg (the EFI wrapper file) — Rocky 9 ignores this file; it reads /boot/grub2/grub.cfg.

Fix: Hardcoded handler to grub2-mkconfig -o /boot/grub2/grub.cfg.

5. cryosparc_user defaulted to nonexistent user

Symptom: chown failed: failed to look up user cryosparc

Fix: group_vars/cryosparc_workers.yml: cryosparc_user: "svc_rmlcryoprd1"

6. SELinux autorelabel after storage reconfiguration

Symptom: touch /.autorelabel (added by 01_reconfig_storage.sh) triggered a full filesystem relabel on the first post-script reboot. New /tmp and /var/tmp LV mounts received incorrect SELinux contexts, which compounded the dbus-broker failure.

Fix: Remove /.autorelabel before rebooting after storage changes, and set SELINUX=permissive temporarily. Run restorecon -Rv /tmp /var/tmp after confirming the system boots cleanly, then restore SELINUX=enforcing.


Post-Run Verification

Run after the playbook (and reboot, if GRUB was updated):

# Storage mounts
df -hT /home /tmp /var/tmp /scratch

# GPU — persistence, clocks, power
nvidia-smi --query-gpu=index,persistence_mode,clocks.current.graphics,power.limit \
  --format=csv

# Hugepages
grep -E 'HugePages|Hugepagesize' /proc/meminfo

# THP — should show [never]
cat /sys/kernel/mm/transparent_hugepage/enabled

# NVMe scheduler — should show [none]
for d in /sys/block/nvme*n*; do
  echo "$(basename $d): $(cat $d/queue/scheduler)"
done

# tuned profile
tuned-adm active

# CPU governor — should show 'performance' for all CPUs
cpupower frequency-info -p | grep governor

# Key services
systemctl status dbus-broker NetworkManager sshd nvidia-persistenced --no-pager

# sysctl spot check
sysctl vm.max_map_count vm.swappiness vm.nr_hugepages

Reboot Requirements

Change Reboot needed?
First run (GRUB cmdline updated) Yes — for 1GiB hugepages and transparent_hugepage=never
Subsequent runs (no GRUB change) No
hugepages_1g_count changed Yes
gpu_disable_ecc: true Yes
All other variable changes No

Extras

[admin@cryo-worker ~]$ for d in /sys/block/nvme*n*; do echo "$(basename $d): $(cat $d/queue/scheduler)"; done
23:46:04

nvme0n1: [none] mq-deadline kyber bfq
nvme1n1: [none] mq-deadline kyber bfq
nvme2n1: [none] mq-deadline kyber bfq
nvme3n1: [none] mq-deadline kyber bfq
[admin@cryo-worker ~]$ df -hT /home /tmp /var/tmp /scratch
23:43:37

Filesystem                  Type  Size  Used Avail Use% Mounted on
/dev/mapper/system-home     xfs   2.0T   15G  2.0T   1% /home
/dev/mapper/system-tmp      xfs   200G  1.5G  199G   1% /tmp
/dev/mapper/system-var_tmp  xfs   200G  1.5G  199G   1% /var/tmp
/dev/mapper/system-lscratch xfs   500G  3.6G  497G   1% /scratch
[admin@cryo-worker ~]$ nvidia-smi --query-gpu=index,name,memory.total,memory.free --format=csv
23:31:57

index, name, memory.total [MiB], memory.free [MiB]
0, NVIDIA L40S, 46068 MiB, 45469 MiB
1, NVIDIA L40S, 46068 MiB, 45469 MiB
2, NVIDIA L40S, 46068 MiB, 45469 MiB
3, NVIDIA L40S, 46068 MiB, 45469 MiB
4, NVIDIA L40S, 46068 MiB, 45469 MiB
5, NVIDIA L40S, 46068 MiB, 45469 MiB
6, NVIDIA L40S, 46068 MiB, 45469 MiB
7, NVIDIA L40S, 46068 MiB, 45469 MiB
[admin@cryo-worker ~]$ nvidia-smi topo -p2p r
23:31:17

        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7
 GPU0   X       OK      OK      OK      OK      OK      OK      OK
 GPU1   OK      X       OK      OK      OK      OK      OK      OK
 GPU2   OK      OK      X       OK      OK      OK      OK      OK
 GPU3   OK      OK      OK      X       OK      OK      OK      OK
 GPU4   OK      OK      OK      OK      X       OK      OK      OK
 GPU5   OK      OK      OK      OK      OK      X       OK      OK
 GPU6   OK      OK      OK      OK      OK      OK      X       OK
 GPU7   OK      OK      OK      OK      OK      OK      OK      X

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown
[admin@cryo-worker ~]$ nvidia-smi topo -m
13:50:58

        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PIX     NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU1    PIX      X      NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU2    NODE    NODE     X      PIX     SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU3    NODE    NODE    PIX      X      SYS     SYS     SYS     SYS     SYS     SYS     0-63    0               N/A
GPU4    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE    PIX     PIX     64-127  1               N/A
GPU5    SYS     SYS     SYS     SYS     NODE     X      PIX     PIX     NODE    NODE    64-127  1               N/A
GPU6    SYS     SYS     SYS     SYS     NODE    PIX      X      PIX     NODE    NODE    64-127  1               N/A
GPU7    SYS     SYS     SYS     SYS     NODE    PIX     PIX      X      NODE    NODE    64-127  1               N/A
NIC0    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE     X      PIX
NIC1    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    PIX      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
[admin@cryo-worker ~]$ lsblk
14:54:20

NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda                          8:0    0 894.3G  0 disk
sdb                          8:16   0 894.3G  0 disk
nvme2n1                    259:0    0    14T  0 disk
nvme1n1                    259:1    0    14T  0 disk
nvme3n1                    259:2    0    14T  0 disk
└─nvme3n1p1                259:3    0    14T  0 part
  └─md0                      9:0    0  27.9T  0 raid0
    ├─system-root          253:0    0    50G  0 lvm   /
    ├─system-swap          253:1    0    16G  0 lvm   [SWAP]
    ├─system-var_crash     253:2    0    50G  0 lvm   /var/crash
    ├─system-var_log_audit 253:3    0   100G  0 lvm   /var/log/audit
    ├─system-var_log       253:4    0   100G  0 lvm   /var/log
    ├─system-var           253:5    0    50G  0 lvm   /var
    ├─system-home          253:6    0     2T  0 lvm   /home
    ├─system-lscratch      253:7    0   500G  0 lvm   /scratch
    ├─system-tmp           253:8    0   200G  0 lvm   /tmp
    └─system-var_tmp       253:9    0   200G  0 lvm   /var/tmp
nvme0n1                    259:4    0    14T  0 disk
├─nvme0n1p1                259:5    0   600M  0 part  /boot/efi
├─nvme0n1p2                259:6    0     2G  0 part  /boot
└─nvme0n1p3                259:7    0    14T  0 part
  └─md0                      9:0    0  27.9T  0 raid0
    ├─system-root          253:0    0    50G  0 lvm   /
    ├─system-swap          253:1    0    16G  0 lvm   [SWAP]
    ├─system-var_crash     253:2    0    50G  0 lvm   /var/crash
    ├─system-var_log_audit 253:3    0   100G  0 lvm   /var/log/audit
    ├─system-var_log       253:4    0   100G  0 lvm   /var/log
    ├─system-var           253:5    0    50G  0 lvm   /var
    ├─system-home          253:6    0     2T  0 lvm   /home
    ├─system-lscratch      253:7    0   500G  0 lvm   /scratch
    ├─system-tmp           253:8    0   200G  0 lvm   /tmp
    └─system-var_tmp       253:9    0   200G  0 lvm   /var/tmp
[admin@cryo-worker ~]$ grep -E 'HugePages|Hugepagesize' /proc/meminfo
23:44:30

AnonHugePages:      2048 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   131072
HugePages_Free:    131072
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
TASK [Playbook complete — summary] **********************************************************************************************************************************************************************************************************
ok: [cryo-worker.niaid.nih.gov] => {
    "msg": [
        "================================================================",
        " Tuning complete: cryo-worker.niaid.nih.gov",
        "================================================================",
        " Applied immediately (no reboot needed):",
        "   nvidia-persistenced, clock lock, power limits",
        "   NVMe scheduler=none, nr_requests=1024",
        "   2MiB hugepages allocated, THP disabled",
        "   sysctl tuning (vm, net, fs, kernel)",
        "   tuned profile=throughput-performance, governor=performance",
        "   /scratch/cryosparc_cache created and permissioned",
        "",
        " Requires reboot:",
        "   1GiB hugepages (hugepagesz=1G in GRUB cmdline)",
        "   ECC change (only if gpu_disable_ecc: true)",
        "",
        " When ready: sudo reboot",
        "================================================================"
    ]
}

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
cryo-worker.niaid.nih.gov    : ok=55   changed=8    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0

About

This collection of roles fully tunes a CryoSPARC GPU worker node for maximum cryo-EM throughput - It is idempotent — safe to re-run after kernel updates,hardware changes, or configuration drift.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages