C++-based CHRR implementation #1453

ripaul · 2025-07-04T10:26:35Z

fix Add CHRR method to sampling #669
description of feature/fix
tests added/passed
add an entry to the next release

TL;DR

Upon popular demand, this PR implements the CHRR algorithm for asymptotically unbiased flux sampling. The provided implementation is blazingly fast due to using the hopsy module, which provides a Python interface to the C++-implemented sampling algorithms from HOPS.

Changes

A HopsySampler class which inherits from HRSampler is implemented, which interfaces to the hopsy library. The HopsySampler applies a rounding transformation and samples the polytope defined by the COBRApy model using the uniform Coordinate Hit-and-Run algorithm.

Based on the improved performance at high numbers of samples, the default method in cobra.sampling.sample is changed to chrr, though dependent on the availability of the hopsy package. In the case that it is not available, we switch back to the OptGP sampler and raise a warning.

As hopsy is a pre-compiled package, guaranteeing its availability on all platforms is not an easy task. Therefore, it is only added as an optional dependency. A corresponding installation option cobra[chrr] is added.

The tests were extended to cover CHRR.

Performance

A small test example shows the performance benefits from using hopsy's CHRR implementation:

from cobra.io import load_model
from cobra.sampling import sample

import hopsy
import numpy as np
import matplotlib.pyplot as plt

import time

model = load_model("e_coli_core")

seed = 42
methods = ['chrr', 'optgp', ]
thinning, n_chains = 100, 4
n_samples = [1_000, 5_000, 10_000, 50_000, 100_000]

ess = dict()
samples = dict()
stats = []

ess, rhat, elapsed = {}, {}, {}
for m in methods:
    ess[m], rhat[m], elapsed[m] = [], [], []
    for n in n_samples:
        _elapsed = time.time()
        samples = sample(model, n=n, processes=n_chains, method=m, thinning=thinning, seed=seed).values
        samples = samples.reshape(n_chains, n // n_chains, -1)
        _elapsed = time.time() - _elapsed
        
        ess[m].append(hopsy.ess(samples).flatten())
        rhat[m].append(hopsy.rhat(samples).flatten())
        elapsed[m].append(_elapsed)
        
        print(m, n, ess[m][-1].mean(), ess[m][-1].min(), elapsed[m][-1])
    ess[m], rhat[m], elapsed[m] = np.array(ess[m]), np.array(rhat[m]), np.array(elapsed[m])    

for i, m in enumerate(methods):
    mean, std = np.mean(ess[m], axis=1), np.std(ess[m], axis=1)
    color = plt.cm.tab10(i)
    plt.plot(n_samples, mean / elapsed[m], color=color, label=m+' (mean)')
    plt.fill_between(n_samples, (mean+std) / elapsed[m], (mean-std) / elapsed[m], color=color, alpha=.2)
    plt.plot(n_samples, np.min(ess[m], axis=1) / elapsed[m], color=color, linestyle='--', label=m+' (min)')
plt.xlabel('number of samples')
plt.ylabel(r'effective samples per second $\rightarrow$')
plt.legend()
plt.show()

for i, m in enumerate(methods):
    plt.plot(n_samples, elapsed[m], color=plt.cm.tab10(i), label=m)
plt.xlabel('number of samples')
plt.ylabel(r'elapsed time [s] $\leftarrow$')
plt.legend()
plt.show()

dims = samples['chrr'].shape[-1]
n, m = int(np.ceil(np.sqrt(dims))), int(np.ceil(np.sqrt(dims)))
fig, axs = plt.subplots(n, m, figsize=(3*m, 3*n), dpi=100)
d = 0
for i in range(n):
    for j in range(m):
        if d < dims:
            axs[i,j].set_title(names[d])
            for k, method in enumerate(samples):
                s = samples[method]
                label = f'ess={round(ess[method][0,d])}'
                axs[i,j].hist(s[:,::,d].T, histtype='step', linewidth=2, color=[f'C{k}' for _ in range(s.shape[0])], density=True, label=label)
                axs[i,j].legend()
            d += 1
        else:
            axs[i,j].axis('off')
        axs[i,j].set_yscale('log')
fig.subplots_adjust(wspace=.25, hspace=.4)
handles, _ = axs[0,0].get_legend_handles_labels()
fig.legend(handles, methods, loc='lower right', bbox_to_anchor=(.8, .15), ncols=len(methods))
plt.show()

The results shown were obtained on a 16-core AMD Ryzen Threadripper PRO 3955WX.

The main costs inflicted by the CHRR arise from the rounding transformation, which in this case is computed using PolyRound, optlang and Gurobi. Porting the rounding transformation to C++ may further improve performance.

Discussion

Regarding the final plot on the marginal flux distributions, it is not exactly clear to me what happens to the GLNabc flux, where OptGP seems to achieve very bad mixing (as measured by the ESS), which explains the bad performance of OptGP when considering the minimum ESS across all dimensions. In particular, it would be important to make sure which one of the samplers samples the correct distribution here. While I'm very confident that hopsy samples correctly in principle, it might be that I overlooked something in the problem setup. I would appreciate some double checking there. Also testing other problems than just the e_coli_core might be meaningful.

cdiener · 2025-07-21T08:17:08Z

Sorry for the delay, currently on paternity leave. I will try to review in the next weeks.

I think it's a great addition and indeed something people have asked for a lot. Regarding the Glnac flux this is because in the other implementations fluxes with a bound difference smaller than the solver tolerance are considered fixed. So a flux with [-1e-12, 1e-12] would be fixed to zero.

ripaul added 7 commits July 2, 2025 14:20

hopsy backend

cfb18b4

adding multichain support to all samplers

729b21c

misc

e9aeead

make everything (hopefully) CI conform

666ac2d

change back to no additional chain kwarg, adapt docstrings

d9c0e48

update docs and refs, change name to 'chrr'

8e290dc

rename optional install to chrr, adjust hopsy import

c179027

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++-based CHRR implementation #1453

C++-based CHRR implementation #1453

ripaul commented Jul 4, 2025 •

edited

Loading

Uh oh!

cdiener commented Jul 21, 2025

Uh oh!

Uh oh!

C++-based CHRR implementation #1453

Are you sure you want to change the base?

C++-based CHRR implementation #1453

Conversation

ripaul commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Changes

Performance

Discussion

Uh oh!

cdiener commented Jul 21, 2025

Uh oh!

Uh oh!

ripaul commented Jul 4, 2025 •

edited

Loading