This collection of roles fully tunes a CryoSPARC GPU worker node for maximum cryo-EM throughput - It is idempotent — safe to re-run after kernel updates,hardware changes, or configuration drift.
The roles are namespaced under roles/component/cryosparc_tune/ and referenced in the playbook as cryosparc_tune/<subrole>. Each sub-role is independently tagged so individual components can be applied without running the full playbook.
Developed on:
- SuperMicro AS-4125GS-TNRT2
- Rocky Linux 9.5
- 8x NVIDIA L40S
- 2.2TiB RAM
- 27.9T - (4x) NVMe RAID0
Purpose: Dedicated CryoSPARC worker node for cryo-EM processing
"It ain't nuttin' but a G thing, bay-bay."
- Directory Structure
- Quick Start
- Role Reference
- Variables
- Optimizations Applied
- Known Issues & Fixes
- Post-Run Verification
- Reboot Requirements
- Extras
|__ cryosparc_tune/
|__ group_vars/
│__ cryosparc_workers.yml # ALL tunable parameters (edit here)
|__ cryosparc_tune.yml # Main Ansible playbook
|__ roles/
|__ nvidia_gpu/ # GPU persistence, clocks, power limits
|__ handlers
|__ tasks
|__ nvme_tuning/ # I/O scheduler, queue depth, udev rules
|__ handlers
|__ tasks
|__ templates
|__ 60-nvme-scheduler.rules.j2
|__ kernel_tuning/ # sysctl, hugepages, GRUB cmdline, tmpfiles
|__ handlers
|__ tasks
|__ templates
|__ 90-cryosparc.conf.j2
|__ cpu_tuning/ # tuned profile, CPU governor, NUMA
|__ tasks
|__ cryosparc_prep/ # /scratch cache dir, mount verification, guidance
|__ tasks
# Full run
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml
# Dry run (check + diff, no changes)
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --check --diff
# Single role by tag
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags nvidia
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags nvme
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags kernel
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags cpu
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml --tags cryosparc
# Verbose output for debugging
ansible-playbook --private-key=~/.ssh/dev -u admin -l cryo-worker* playbooks/cryosparc_tune.yml -vv 2>&1 | tee ansible_tune.logAfter the first run, a reboot is required for GRUB cmdline changes (1GiB hugepages,
transparent_hugepage=never) to take effect. Subsequent runs are fully online with no reboot needed unless you changehugepages_1g_count.
Tag: nvidia
Configures all 8 L40S GPUs for sustained CryoSPARC workloads.
| Task | What it does |
|---|---|
Verify nvidia-smi |
Fails fast if driver is missing |
nvidia-persistenced |
Enables persistence mode service — eliminates GPU cold-start latency on first CUDA call |
| Persistence mode | nvidia-smi -pm 1 — all GPUs |
| Compute mode | DEFAULT (mode 0) — allows multiple processes per GPU, required for CryoSPARC multi-job scheduling |
| Power limit | Set to {{ gpu_power_limit_watts }}W (default: 350W) on each GPU |
| Clock lock | --lock-gpu-clocks={{ gpu_clock_min_mhz }},{{ gpu_clock_max_mhz }} (default: 1350–2520 MHz) — eliminates frequency throttling under sustained load |
| ECC | Optional disable via gpu_disable_ecc: true — recovers ~2.7 GiB VRAM per GPU but requires reboot |
nvidia-fabricmanager |
Checked via systemctl list-unit-files (not present on PCIe-only L40S — safe no-op) |
| GPU inventory | Prints index, name, VRAM, persistence, power limit, and current clocks to Ansible output |
Handlers:
reboot required for ecc change— fires if ECC state was changed, prints a reminder that reboot is needed
Tag: nvme
Tunes the I/O stack for the NVMe RAID0 array (md0) that backs all LVs.
| Task | What it does |
|---|---|
| udev rules | Deploys /etc/udev/rules.d/60-nvme-scheduler.rules — sets scheduler, queue depth, and read-ahead on match |
| NVMe scheduler | none — bypasses the kernel I/O scheduler entirely; NVMe drives have their own internal queuing |
| NVMe queue depth | nr_requests=1024 — allows deep pipelining for large sequential I/O |
| NVMe read-ahead | read_ahead_kb=2048 — 2 MiB read-ahead for large particle stack access patterns |
| SATA scheduler | mq-deadline on sda/sdb — appropriate for rotational or slower flash |
| udev reload | udevadm control --reload-rules && udevadm trigger |
Template variables:
nvme_scheduler: "none"
nvme_nr_requests: 1024
nvme_read_ahead_kb: 2048
sata_scheduler: "mq-deadline"Tag: kernel
The most complex role — handles sysctl, hugepages (both 2MiB runtime and 1GiB
GRUB cmdline), THP disable, and ensures tmp.mount is never masked.
| Key | Value | Reason |
|---|---|---|
vm.swappiness |
5 | 2.2 TiB RAM — swap should be a last resort only |
vm.dirty_ratio |
5 | ~110 GiB dirty limit before writeback — sustained NVMe throughput |
vm.dirty_background_ratio |
2 | Background writeback starts at ~44 GiB |
vm.dirty_writeback_centisecs |
100 | Writeback every 1s (default 5s) |
vm.dirty_expire_centisecs |
3000 | Expire dirty pages after 30s |
vm.max_map_count |
16777216 | Critical for CUDA — 8 GPUs × many VMA regions per context. Default 65536 causes CUDA launch failures |
vm.overcommit_memory |
1 | CUDA and cryo-EM allocate large virtual address maps |
vm.nr_hugepages |
131072 | 256 GiB of 2MiB hugepages — also set at runtime |
vm.nr_overcommit_hugepages |
32768 | Burst headroom for hugepage demand |
kernel.shmmax |
1073741824000 | ~1 TiB max shared memory segment (CUDA IPC) |
kernel.shmall |
268435456 | Total shared memory pages |
net.core.rmem_max / wmem_max |
134217728 | 128 MiB socket buffers for GPFS/NFS throughput |
net.ipv4.tcp_congestion_control |
bbr | Better throughput on high-bandwidth links |
fs.inotify.max_user_watches |
1048576 | CryoSPARC watches job directories continuously |
fs.file-max |
2097152 | System-wide file descriptor limit |
Note:
kernel.sched_min_granularity_nsandkernel.sched_wakeup_granularity_nsare not available on Rocky 9's default kernel (requiresCONFIG_SCHED_DEBUG, excluded from production RHEL kernels). These were removed from the template. Scheduler tuning is handled instead by thetunedthroughput-performance profile.
2MiB hugepages are allocated at runtime via ansible.posix.sysctl — no
reboot needed. A warning is printed if the system cannot allocate the full
count (fragmented memory).
1GiB hugepages are written to GRUB_CMDLINE_LINUX as:
hugepagesz=1G hugepages=64 hugepagesz=2M hugepages=131072 transparent_hugepage=never
This requires one reboot to take effect. The task is idempotent — it strips any existing hugepage/THP tokens before appending, so re-runs do not duplicate kernel args.
THP (Transparent Hugepages) is disabled both at runtime
(/sys/kernel/mm/transparent_hugepage/enabled) and persisted via
/etc/rc.d/rc.local. THP causes latency spikes and memory bloat under
cryo-EM workloads — always set to never.
DO NOT mask
tmp.mounton this system.
This role explicitly unmasks tmp.mount. Masking it breaks PrivateTmp
namespace sandboxing used by dbus-broker, journald, and other core
systemd services — even when /tmp is backed by a dedicated LV via fstab.
The fstab entry is sufficient; tmp.mount does not conflict with it.
Masking tmp.mount caused a complete boot failure on this system:
dbus-broker exited with status=226/NAMESPACE → NetworkManager failed
→ no network → no SSH. See Known Issues.
Handlers:
apply sysctl— runssysctl --systemrebuild grub— runsgrub2-mkconfig -o /boot/grub2/grub.cfg
Tag: cpu
| Task | What it does |
|---|---|
Install kernel-tools |
Provides cpupower |
Install tuned |
System tuning daemon |
tuned profile |
Set to throughput-performance — disables power saving, maximises CPU throughput, handles scheduler tuning that sysctl keys cannot |
| CPU governor | performance via cpupower frequency-set — locks all cores to max frequency |
numactl |
Installed and NUMA topology printed to output for lane assignment reference |
NUMA topology (L40S on this server):
| NUMA Node | GPU Indices | PCIe Buses |
|---|---|---|
| Node 0 | GPUs 0–3 | 06, 07, 46, 47 |
| Node 1 | GPUs 4–7 | 87, C3, C4, C5 |
Configure two CryoSPARC worker lanes aligned to these NUMA nodes for best
memory locality. Verify with nvidia-smi topo -m.
Tag: cryosparc
Final readiness checks and CryoSPARC-specific configuration.
| Task | What it does |
|---|---|
Verify /scratch mounted |
Fails with helpful message if not — confirms storage script ran |
Check noatime |
Warns if /scratch is missing the noatime mount option |
| Create cache dir | {{ cryosparc_cache_dir }} (default: /scratch/cryosparc_cache) |
| Set ownership | chown to {{ cryosparc_user }} if cryosparc_user_manage_ownership: true |
restorecon |
Restores SELinux context on the cache directory |
| Mount summary | Prints df -hT for /home /tmp /var/tmp /scratch |
| Config instructions | Prints the exact cryosparcw connect command with correct path and quota |
All variables live in group_vars/cryosparc_workers.yml. Edit there — never
hardcode values in tasks.
| Variable | Default | Description |
|---|---|---|
gpu_count |
8 |
Number of GPUs |
gpu_model |
L40S |
Used in debug output |
gpu_power_limit_watts |
350 |
Per-GPU power cap (max for L40S is 350W) |
gpu_clock_min_mhz |
1350 |
Minimum locked graphics clock |
gpu_clock_max_mhz |
2520 |
Maximum locked graphics clock (boost clock) |
gpu_disable_ecc |
false |
Set true to recover ~2.7 GiB VRAM/GPU (reboot required) |
| Variable | Default | Description |
|---|---|---|
hugepages_2m_count |
131072 |
2MiB hugepages = 256 GiB (runtime, no reboot) |
hugepages_1g_count |
64 |
1GiB hugepages = 64 GiB (GRUB cmdline, reboot required) |
transparent_hugepage |
never |
THP setting — never change from never on cryo-EM nodes |
| Variable | Default | Description |
|---|---|---|
cryosparc_cache_dir |
/scratch/cryosparc_cache |
SSD cache path |
cryosparc_cache_quota_mb |
409600 |
Cache quota in MB (400 GiB — leaves 100 GiB headroom) |
cryosparc_user |
svc_rmlcryoprd1 |
OS service account that owns the cache dir |
cryosparc_user_manage_ownership |
true |
Set false if user doesn't exist yet |
| Variable | Default | Description |
|---|---|---|
nvme_scheduler |
none |
I/O scheduler for NVMe devices |
nvme_nr_requests |
1024 |
Queue depth per NVMe device |
nvme_read_ahead_kb |
2048 |
Read-ahead in KiB |
sata_scheduler |
mq-deadline |
Scheduler for SATA devices |
| Variable | Default | Description |
|---|---|---|
tuned_profile |
throughput-performance |
tuned profile |
cpu_governor |
performance |
cpupower frequency governor |
| Variable | Default | Notes |
|---|---|---|
vm_swappiness |
5 |
|
vm_dirty_ratio |
5 |
|
vm_dirty_background_ratio |
2 |
|
vm_dirty_writeback_centisecs |
100 |
|
vm_dirty_expire_centisecs |
3000 |
|
vm_max_map_count |
16777216 |
Must be ≥ 16M for 8-GPU CUDA workloads |
vm_overcommit_memory |
1 |
|
fs_inotify_max_user_watches |
1048576 |
|
fs_inotify_max_user_instances |
4096 |
|
fs_file_max |
2097152 |
|
kernel_shmmax |
1073741824000 |
~1 TiB |
kernel_shmall |
268435456 |
|
net_core_rmem_max |
134217728 |
128 MiB |
net_core_wmem_max |
134217728 |
128 MiB |
- Persistence mode eliminates the ~500ms GPU initialization delay on the first CUDA call of each job — critical for CryoSPARC's short-lived GPU processes
- Clock locking prevents the GPU from throttling during compute-bound phases and eliminates frequency ramp-up latency between jobs
- DEFAULT compute mode allows CryoSPARC to schedule multiple jobs to the same GPU simultaneously, maximising utilisation during mixed workloads
- 256 GiB of 2MiB hugepages pre-allocated at runtime for CUDA pinned memory and large array allocations common in CTF estimation and 2D classification
- 64 GiB of 1GiB hugepages in the kernel cmdline for very large contiguous allocations (3D refinement volumes)
- THP disabled — transparent hugepages cause unpredictable latency spikes when the kernel attempts to collapse/split pages during cryo-EM I/O bursts
vm.max_map_count=16M— the default of 65536 is insufficient for 8 GPUs under load; CUDA requires hundreds of VMA regions per context and will fail withCUDA_ERROR_OUT_OF_MEMORYor launch errors without this
- NVMe scheduler
none— modern NVMe controllers implement their own internal command queuing (NCQ). Inserting a kernel I/O scheduler adds latency with no benefit - Queue depth 1024 — allows the NVMe controller to reorder and coalesce deeply pipelined requests from concurrent CryoSPARC workers
- Read-ahead 2048K — aligns with CryoSPARC's large sequential access pattern when loading particle stacks and micrographs
noatime,nodiratimeon/scratch— eliminates inode update writes on every SSD cache read
throughput-performancetuned profile — disables CPU power saving states (C-states), sets CPU frequency scaling to max, and tunes the kernel scheduler for throughput over latencyperformanceCPU governor — all cores run at maximum frequency; avoids frequency ramp latency when CryoSPARC spawns CPU-side preprocessing workers- BBR congestion control — better throughput for GPFS/NFS data ingestion from bigsky on high-bandwidth links
- Large dirty ratios — allows up to ~110 GiB of dirty write cache before kernel writeback, sustaining NVMe write throughput during movie stack imports
These bugs were encountered and resolved during initial deployment. They are documented here so future administrators understand the design decisions.
Symptom: After playbook run + reboot, system came up with no network.
dbus-broker failing: status=226/NAMESPACE. NetworkManager dependency
failed. SSH inaccessible.
Root cause: Masking tmp.mount prevents systemd from setting up the
/run/systemd/unit-root private mount namespace that dbus-broker (and
many other services with PrivateTmp=yes) requires. The LV-backed /tmp
via fstab is completely unrelated — systemd respects fstab mounts and does
not overlay them with tmpfs. Masking tmp.mount was unnecessary and
catastrophic.
Fix: kernel_tuning now runs systemctl unmask tmp.mount instead.
The fstab entry alone is sufficient.
Recovery path used:
- Boot to
init=/bin/bash selinux=0via GRUB editor passwd root+ enablePermitRootLogin yesin sshd_config- Reboot → log in as root via iKVM console
systemctl unmask tmp.mount && systemctl start dbus-brokersystemctl start NetworkManager→ SSH restored- Fix fstab trailing commas, reboot cleanly
Symptom: ansible.builtin.systemd with ignore_errors: true still
caused unreliable when: condition evaluation when the unit didn't exist.
Fix: Replaced with shell check:
ansible.builtin.shell: >
systemctl list-unit-files nvidia-fabricmanager.service --no-legend
| grep -q nvidia-fabricmanager
register: fabricmanager_check
failed_when: false
# then: when: fabricmanager_check.rc == 0Symptom: sysctl -p failed on kernel.sched_min_granularity_ns and
kernel.sched_wakeup_granularity_ns.
Root cause: These keys require CONFIG_SCHED_DEBUG which is excluded
from production RHEL/Rocky kernels.
Fix: Removed from 90-cryosparc.conf.j2. Added --ignore flag to
sysctl -p invocation as defence-in-depth.
Symptom: grub2-mkconfig wrote to /boot/efi/EFI/rocky/grub.cfg
(the EFI wrapper file) — Rocky 9 ignores this file; it reads
/boot/grub2/grub.cfg.
Fix: Hardcoded handler to grub2-mkconfig -o /boot/grub2/grub.cfg.
Symptom: chown failed: failed to look up user cryosparc
Fix: group_vars/cryosparc_workers.yml: cryosparc_user: "svc_rmlcryoprd1"
Symptom: touch /.autorelabel (added by 01_reconfig_storage.sh)
triggered a full filesystem relabel on the first post-script reboot. New
/tmp and /var/tmp LV mounts received incorrect SELinux contexts, which
compounded the dbus-broker failure.
Fix: Remove /.autorelabel before rebooting after storage changes, and
set SELINUX=permissive temporarily. Run restorecon -Rv /tmp /var/tmp
after confirming the system boots cleanly, then restore SELINUX=enforcing.
Run after the playbook (and reboot, if GRUB was updated):
# Storage mounts
df -hT /home /tmp /var/tmp /scratch
# GPU — persistence, clocks, power
nvidia-smi --query-gpu=index,persistence_mode,clocks.current.graphics,power.limit \
--format=csv
# Hugepages
grep -E 'HugePages|Hugepagesize' /proc/meminfo
# THP — should show [never]
cat /sys/kernel/mm/transparent_hugepage/enabled
# NVMe scheduler — should show [none]
for d in /sys/block/nvme*n*; do
echo "$(basename $d): $(cat $d/queue/scheduler)"
done
# tuned profile
tuned-adm active
# CPU governor — should show 'performance' for all CPUs
cpupower frequency-info -p | grep governor
# Key services
systemctl status dbus-broker NetworkManager sshd nvidia-persistenced --no-pager
# sysctl spot check
sysctl vm.max_map_count vm.swappiness vm.nr_hugepages| Change | Reboot needed? |
|---|---|
| First run (GRUB cmdline updated) | Yes — for 1GiB hugepages and transparent_hugepage=never |
| Subsequent runs (no GRUB change) | No |
hugepages_1g_count changed |
Yes |
gpu_disable_ecc: true |
Yes |
| All other variable changes | No |
[admin@cryo-worker ~]$ for d in /sys/block/nvme*n*; do echo "$(basename $d): $(cat $d/queue/scheduler)"; done
23:46:04
nvme0n1: [none] mq-deadline kyber bfq
nvme1n1: [none] mq-deadline kyber bfq
nvme2n1: [none] mq-deadline kyber bfq
nvme3n1: [none] mq-deadline kyber bfq
[admin@cryo-worker ~]$ df -hT /home /tmp /var/tmp /scratch
23:43:37
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/system-home xfs 2.0T 15G 2.0T 1% /home
/dev/mapper/system-tmp xfs 200G 1.5G 199G 1% /tmp
/dev/mapper/system-var_tmp xfs 200G 1.5G 199G 1% /var/tmp
/dev/mapper/system-lscratch xfs 500G 3.6G 497G 1% /scratch
[admin@cryo-worker ~]$ nvidia-smi --query-gpu=index,name,memory.total,memory.free --format=csv
23:31:57
index, name, memory.total [MiB], memory.free [MiB]
0, NVIDIA L40S, 46068 MiB, 45469 MiB
1, NVIDIA L40S, 46068 MiB, 45469 MiB
2, NVIDIA L40S, 46068 MiB, 45469 MiB
3, NVIDIA L40S, 46068 MiB, 45469 MiB
4, NVIDIA L40S, 46068 MiB, 45469 MiB
5, NVIDIA L40S, 46068 MiB, 45469 MiB
6, NVIDIA L40S, 46068 MiB, 45469 MiB
7, NVIDIA L40S, 46068 MiB, 45469 MiB
[admin@cryo-worker ~]$ nvidia-smi topo -p2p r
23:31:17
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 X OK OK OK OK OK OK OK
GPU1 OK X OK OK OK OK OK OK
GPU2 OK OK X OK OK OK OK OK
GPU3 OK OK OK X OK OK OK OK
GPU4 OK OK OK OK X OK OK OK
GPU5 OK OK OK OK OK X OK OK
GPU6 OK OK OK OK OK OK X OK
GPU7 OK OK OK OK OK OK OK X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown
[admin@cryo-worker ~]$ nvidia-smi topo -m
13:50:58
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX NODE NODE SYS SYS SYS SYS SYS SYS 0-63 0 N/A
GPU1 PIX X NODE NODE SYS SYS SYS SYS SYS SYS 0-63 0 N/A
GPU2 NODE NODE X PIX SYS SYS SYS SYS SYS SYS 0-63 0 N/A
GPU3 NODE NODE PIX X SYS SYS SYS SYS SYS SYS 0-63 0 N/A
GPU4 SYS SYS SYS SYS X NODE NODE NODE PIX PIX 64-127 1 N/A
GPU5 SYS SYS SYS SYS NODE X PIX PIX NODE NODE 64-127 1 N/A
GPU6 SYS SYS SYS SYS NODE PIX X PIX NODE NODE 64-127 1 N/A
GPU7 SYS SYS SYS SYS NODE PIX PIX X NODE NODE 64-127 1 N/A
NIC0 SYS SYS SYS SYS PIX NODE NODE NODE X PIX
NIC1 SYS SYS SYS SYS PIX NODE NODE NODE PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
[admin@cryo-worker ~]$ lsblk
14:54:20
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 894.3G 0 disk
sdb 8:16 0 894.3G 0 disk
nvme2n1 259:0 0 14T 0 disk
nvme1n1 259:1 0 14T 0 disk
nvme3n1 259:2 0 14T 0 disk
└─nvme3n1p1 259:3 0 14T 0 part
└─md0 9:0 0 27.9T 0 raid0
├─system-root 253:0 0 50G 0 lvm /
├─system-swap 253:1 0 16G 0 lvm [SWAP]
├─system-var_crash 253:2 0 50G 0 lvm /var/crash
├─system-var_log_audit 253:3 0 100G 0 lvm /var/log/audit
├─system-var_log 253:4 0 100G 0 lvm /var/log
├─system-var 253:5 0 50G 0 lvm /var
├─system-home 253:6 0 2T 0 lvm /home
├─system-lscratch 253:7 0 500G 0 lvm /scratch
├─system-tmp 253:8 0 200G 0 lvm /tmp
└─system-var_tmp 253:9 0 200G 0 lvm /var/tmp
nvme0n1 259:4 0 14T 0 disk
├─nvme0n1p1 259:5 0 600M 0 part /boot/efi
├─nvme0n1p2 259:6 0 2G 0 part /boot
└─nvme0n1p3 259:7 0 14T 0 part
└─md0 9:0 0 27.9T 0 raid0
├─system-root 253:0 0 50G 0 lvm /
├─system-swap 253:1 0 16G 0 lvm [SWAP]
├─system-var_crash 253:2 0 50G 0 lvm /var/crash
├─system-var_log_audit 253:3 0 100G 0 lvm /var/log/audit
├─system-var_log 253:4 0 100G 0 lvm /var/log
├─system-var 253:5 0 50G 0 lvm /var
├─system-home 253:6 0 2T 0 lvm /home
├─system-lscratch 253:7 0 500G 0 lvm /scratch
├─system-tmp 253:8 0 200G 0 lvm /tmp
└─system-var_tmp 253:9 0 200G 0 lvm /var/tmp
[admin@cryo-worker ~]$ grep -E 'HugePages|Hugepagesize' /proc/meminfo
23:44:30
AnonHugePages: 2048 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 131072
HugePages_Free: 131072
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
TASK [Playbook complete — summary] **********************************************************************************************************************************************************************************************************
ok: [cryo-worker.niaid.nih.gov] => {
"msg": [
"================================================================",
" Tuning complete: cryo-worker.niaid.nih.gov",
"================================================================",
" Applied immediately (no reboot needed):",
" nvidia-persistenced, clock lock, power limits",
" NVMe scheduler=none, nr_requests=1024",
" 2MiB hugepages allocated, THP disabled",
" sysctl tuning (vm, net, fs, kernel)",
" tuned profile=throughput-performance, governor=performance",
" /scratch/cryosparc_cache created and permissioned",
"",
" Requires reboot:",
" 1GiB hugepages (hugepagesz=1G in GRUB cmdline)",
" ECC change (only if gpu_disable_ecc: true)",
"",
" When ready: sudo reboot",
"================================================================"
]
}
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
cryo-worker.niaid.nih.gov : ok=55 changed=8 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0