GitHub Repository for the hands-on snakemake learn session at the MannLabs Group Retreat 2025
Snakemake is a python-based workflow manager that is supposed to make your life easier when analysing large datasets. It enforces reproducibility and enables scalability.
In this tutorial, we will
- read in a dataset (here: a small image)
- process it with a simple function (here: apply different image transformations to it)
- generate a plot as output (here: histograms of pixel intensities)
- generate a snakemake report.
-
Using the command line, go into your favorite directory (
cd /path/to/my/favorite/directory) -
Clone this repository
git clone https://github.com/lucas-diedrich/snakemake-learning.git(or download it via Code > Download ZIP, and unzip it locally)
- Go into the directory
cd snakemake-learning- Create a
mamba/condaenvironment with snakemake based on theenvironment.yamlfile and activate it
mamba create -n snakemake-env --file environment.yaml && mamba activate snakemake-env
# OR conda env create -f environment.yaml && conda activate snakemake-env- Check if the installation was successful
snakemake --version
> 9.5.1See the slides in ./docs
Run the following command in the root directory (.) to se the whole task graph.
# --dag: Directed acyclic graph
snakemake --dag And the following command to inspect how the rules depend on one another (simpler than task graph, especially for large workflows)
# --rulegraph: Show dependencies between rules
snakemake --rulegraph---
title: Rule Graph
---
flowchart TB
id0[all]
id1[plot_histogram]
id2[transform_image]
id3[save_image]
style id0 fill:#CD5C5C,stroke-width:2px,color:#333333
style id1 fill:#F08080,stroke-width:2px,color:#333333
style id2 fill:#FA8072,stroke-width:2px,color:#333333
style id3 fill:#E9967A,stroke-width:2px,color:#333333
id0 --> id0
id1 --> id0
id2 --> id1
id3 --> id2
You can use this grapviz visualizer editor to view the task graph
Go in the ./workflow directory and run:
snakemake --cores 2 --use-condaThe output can be found in the ./results directory
Go in the ./workflow directory and run
snakemake --report ../results/report.htmlThe output can be found in the ./results directory
You can run this workflow on an high-performance computing cluster (here leveraging the slurm manager). In this case, one slurm job acts as a scheduler that submits individual rule executions as separate slurm jobs. The snakemake-executor-plugin-slurm automatically handles the scheduling and submission of dependent jobs. Please checkout the script /workflow/snakemake.sbatch and the official snakemake slurm plugin documentation to learn more about the relevant flags and settings.
Install the environment
conda create -n snakemake-env -y
conda env update --n snakemake-env --file environment.yaml
Additionally install the snakemake-executor-plugin-slurm:
pip install snakemake-executor-plugin-slurmThen submit the provided workflow script on a cluster
cd /workflow/
sbatch snakemake.sbatchTo further deepen your understanding after the workshop.
The script create-data.py can take image names (that are part of the skimage package) as arguments.
python scripts/create-data.py --image-name <image name> --output <output name>Modify the workflow in a way that it also (=in addition) runs on other skimage example datasets, e.g. colorwheel, cat, logo
Add a new rule in which you generate an aggregated plot - where the image and its modifications are shown in the top row and the associated histograms are shown in the bottom row.
Explore possibilities to modify the report with the rich structured text format.
-
Snakemake homepage + Documentation snakemake.readthedocs.io
-
Publication Mölder F, Jablonski KP, Letcher B et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)
