Contains the reproduction details for the publication on the wall-time efficient and robust optimal transport Gaussian Process dimer.
Third in a series of papers [1, 2] on the Dimer method with a focus on the failure modes of the Gaussian Process Regression model on the 500 system benchmark of Hermes et. al. [3].
If you use this repository or its parts please cite the corresponding publication or data source.
Accepted at ChemPhysChem (in production).
R. Goswami and H. Jónsson, “Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches,” ChemPhysChem, Nov. 2025, doi: 10.1002/cphc.202500730.
A more accessible form of the same publication is:
At the Materials Cloud Archive.R. Goswami, and H. Jónsson, “Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches,” Oct 08, 2025, arXiv: arXiv:2505.13621. doi: 10.48550/arXiv.2505.13621.
R. Goswami and H. Jónsson, “Adaptive pruning for increased robustness and reduced computational overhead in gaussian process accelerated saddle point searches.” Materials Cloud, p. 405529315, Oct. 09, 2025. doi: 10.24435/MATERIALSCLOUD:RH-TW.
Remember to inflate the data using the materialscloud source (section ref:mca) before using the scripts in the repository. Assuming that the .xz files have been downloaded to data relative to the repository root:
export GITROOT=$(git rev-parse --show-toplevel)
cd $GITROOT/data
tar -xf otgpd_alldat.tar.xz && rm -rf otgpd_alldat.tar.xz
# Raw benchmark data, i.e., EON output logs
cp $GITROOT/data/hpc.tar.xz $GITROOT/bench_runs/runs/hpc
cd $GITROOT/bench_runs/runs/hpc
tar -xf hpc.tar.xz && rm -rf hpc.tar.xzThe repository has code archives, benchmark runs, and scripts for analysis.
❯ tree -L 2
.
├── CODEOWNERS
├── docs
│ ├── 00_freeform.org
│ ├── 01_hpc.org
│ ├── 03_suppl_viz.org
│ ├── 04_models.org
│ └── 05_suppl_py.org
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── runs
│ ├── automated
│ ├── calc_rundata.py
│ ├── init_condcheck.py
│ └── run_pf.py
├── scripts
│ ├── build_nwchem.sh
│ └── env_setup.sh
└── subrepos
├── chemparseplot
├── eOn
├── gpr_optim
├── IterativeRotationsAssignments
├── readme.org
└── rgpycrumbsWhere the data in the archives expands to locations within the benchmarks.
Each of the benchmarks consists of the following structure:
.
├── doublet
│ ├── 000
# .....
│ └── 234
└── singlet
│ ├── 000
# .....
└── 264Comprising of 500 systems.
For comparisons:
- GPDimer runs
- Extract from the relevant materials cloud archive.
- Dimer (rotation separated) runs
- From this archive
A reproducible setup for generating benchmarks discussed elsewhere [1 (Github), 2 (Github)].
docs/contains documentation.pixitasks encapsulatedevtasks.
Each of the main tools are mirrored with git-subrepo to ensure
reproducibility.
The raw data stores are located on the University of Iceland OneDrive instance and handled via a WebDAV interface to the store.
rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr localhost:9677[1] R. Goswami, M. Masterov, S. Kamath, A. Pena-Torres, and H. Jónsson, “Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions,” J. Chem. Theory Comput., Jul. 2025, doi: 10.1021/acs.jctc.5c00866.
[2] R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” AIP Adv., vol. 15, no. 8, p. 85210, Aug. 2025, doi: 10.1063/5.0283639.
[3] E. D. Hermes, K. Sargsyan, H. N. Najm, and J. Zádor, “Sella, an Open-Source Automation-Friendly Molecular Saddle Point Optimizer,” J. Chem. Theory Comput., vol. 18, no. 11, pp. 6974–6988, Nov. 2022, doi: 10.1021/acs.jctc.2c00395.
MIT. Sub-packages have their own licenses.