Skip to content

TheochemUI/otgpd_repro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

Contains the reproduction details for the publication on the wall-time efficient and robust optimal transport Gaussian Process dimer.

Third in a series of papers [1, 2] on the Dimer method with a focus on the failure modes of the Gaussian Process Regression model on the 500 system benchmark of Hermes et. al. [3].

Reference

If you use this repository or its parts please cite the corresponding publication or data source.

Primary Publication

Accepted at ChemPhysChem (in production).

R. Goswami and H. Jónsson, “Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches,” ChemPhysChem, Nov. 2025, doi: 10.1002/cphc.202500730.

Preprint

A more accessible form of the same publication is:

R. Goswami, and H. Jónsson, “Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point Searches,” Oct 08, 2025, arXiv: arXiv:2505.13621. doi: 10.48550/arXiv.2505.13621.

Data source

At the Materials Cloud Archive.

R. Goswami and H. Jónsson, “Adaptive pruning for increased robustness and reduced computational overhead in gaussian process accelerated saddle point searches.” Materials Cloud, p. 405529315, Oct. 09, 2025. doi: 10.24435/MATERIALSCLOUD:RH-TW.

Replication data

Remember to inflate the data using the materialscloud source (section ref:mca) before using the scripts in the repository. Assuming that the .xz files have been downloaded to data relative to the repository root:

export GITROOT=$(git rev-parse --show-toplevel)
cd $GITROOT/data
tar -xf otgpd_alldat.tar.xz && rm -rf otgpd_alldat.tar.xz
# Raw benchmark data, i.e., EON output logs
cp $GITROOT/data/hpc.tar.xz $GITROOT/bench_runs/runs/hpc
cd $GITROOT/bench_runs/runs/hpc
tar -xf hpc.tar.xz && rm -rf hpc.tar.xz

Structure

The repository has code archives, benchmark runs, and scripts for analysis.

❯ tree -L 2
.
├── CODEOWNERS
├── docs
│   ├── 00_freeform.org
│   ├── 01_hpc.org
│   ├── 03_suppl_viz.org
│   ├── 04_models.org
│   └── 05_suppl_py.org
├── LICENSE
├── pixi.lock
├── pixi.toml
├── readme.org
├── runs
│   ├── automated
│   ├── calc_rundata.py
│   ├── init_condcheck.py
│   └── run_pf.py
├── scripts
│   ├── build_nwchem.sh
│   └── env_setup.sh
└── subrepos
    ├── chemparseplot
    ├── eOn
    ├── gpr_optim
    ├── IterativeRotationsAssignments
    ├── readme.org
    └── rgpycrumbs

Where the data in the archives expands to locations within the benchmarks.

Each of the benchmarks consists of the following structure:

.
├── doublet
│   ├── 000
# .....
│   └── 234
└── singlet
│   ├── 000
# .....
    └── 264

Comprising of 500 systems.

For comparisons:

GPDimer runs
Extract from the relevant materials cloud archive.
Dimer (rotation separated) runs
From this archive

Usage

A reproducible setup for generating benchmarks discussed elsewhere [1 (Github), 2 (Github)].

Setup

  • docs/ contains documentation.
  • pixi tasks encapsulate dev tasks.

Design

Sub-repositories

Each of the main tools are mirrored with git-subrepo to ensure reproducibility.

DVC

The raw data stores are located on the University of Iceland OneDrive instance and handled via a WebDAV interface to the store.

rclone serve webdav HIOneDrive:/.dvcstore --vfs-cache-mode full --addr localhost:9677

References

[1] R. Goswami, M. Masterov, S. Kamath, A. Pena-Torres, and H. Jónsson, “Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular Reactions,” J. Chem. Theory Comput., Jul. 2025, doi: 10.1021/acs.jctc.5c00866.

[2] R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” AIP Adv., vol. 15, no. 8, p. 85210, Aug. 2025, doi: 10.1063/5.0283639.

[3] E. D. Hermes, K. Sargsyan, H. N. Najm, and J. Zádor, “Sella, an Open-Source Automation-Friendly Molecular Saddle Point Optimizer,” J. Chem. Theory Comput., vol. 18, no. 11, pp. 6974–6988, Nov. 2022, doi: 10.1021/acs.jctc.2c00395.

License

MIT. Sub-packages have their own licenses.

About

OTGPD reproducer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors