Skip to content

C++-based CHRR implementation #1453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: devel
Choose a base branch
from
Open

Conversation

ripaul
Copy link

@ripaul ripaul commented Jul 4, 2025

TL;DR

Upon popular demand, this PR implements the CHRR algorithm for asymptotically unbiased flux sampling. The provided implementation is blazingly fast due to using the hopsy module, which provides a Python interface to the C++-implemented sampling algorithms from HOPS.

Changes

A HopsySampler class which inherits from HRSampler is implemented, which interfaces to the hopsy library. The HopsySampler applies a rounding transformation and samples the polytope defined by the COBRApy model using the uniform Coordinate Hit-and-Run algorithm.

Based on the improved performance at high numbers of samples, the default method in cobra.sampling.sample is changed to chrr, though dependent on the availability of the hopsy package. In the case that it is not available, we switch back to the OptGP sampler and raise a warning.

As hopsy is a pre-compiled package, guaranteeing its availability on all platforms is not an easy task. Therefore, it is only added as an optional dependency. A corresponding installation option cobra[chrr] is added.

The tests were extended to cover CHRR.

Performance

A small test example shows the performance benefits from using hopsy's CHRR implementation:

from cobra.io import load_model
from cobra.sampling import sample

import hopsy
import numpy as np
import matplotlib.pyplot as plt

import time

model = load_model("e_coli_core")

seed = 42
methods = ['chrr', 'optgp', ]
thinning, n_chains = 100, 4
n_samples = [1_000, 5_000, 10_000, 50_000, 100_000]

ess = dict()
samples = dict()
stats = []

ess, rhat, elapsed = {}, {}, {}
for m in methods:
    ess[m], rhat[m], elapsed[m] = [], [], []
    for n in n_samples:
        _elapsed = time.time()
        samples = sample(model, n=n, processes=n_chains, method=m, thinning=thinning, seed=seed).values
        samples = samples.reshape(n_chains, n // n_chains, -1)
        _elapsed = time.time() - _elapsed
        
        ess[m].append(hopsy.ess(samples).flatten())
        rhat[m].append(hopsy.rhat(samples).flatten())
        elapsed[m].append(_elapsed)
        
        print(m, n, ess[m][-1].mean(), ess[m][-1].min(), elapsed[m][-1])
    ess[m], rhat[m], elapsed[m] = np.array(ess[m]), np.array(rhat[m]), np.array(elapsed[m])    

for i, m in enumerate(methods):
    mean, std = np.mean(ess[m], axis=1), np.std(ess[m], axis=1)
    color = plt.cm.tab10(i)
    plt.plot(n_samples, mean / elapsed[m], color=color, label=m+' (mean)')
    plt.fill_between(n_samples, (mean+std) / elapsed[m], (mean-std) / elapsed[m], color=color, alpha=.2)
    plt.plot(n_samples, np.min(ess[m], axis=1) / elapsed[m], color=color, linestyle='--', label=m+' (min)')
plt.xlabel('number of samples')
plt.ylabel(r'effective samples per second $\rightarrow$')
plt.legend()
plt.show()

for i, m in enumerate(methods):
    plt.plot(n_samples, elapsed[m], color=plt.cm.tab10(i), label=m)
plt.xlabel('number of samples')
plt.ylabel(r'elapsed time [s] $\leftarrow$')
plt.legend()
plt.show()

dims = samples['chrr'].shape[-1]
n, m = int(np.ceil(np.sqrt(dims))), int(np.ceil(np.sqrt(dims)))
fig, axs = plt.subplots(n, m, figsize=(3*m, 3*n), dpi=100)
d = 0
for i in range(n):
    for j in range(m):
        if d < dims:
            axs[i,j].set_title(names[d])
            for k, method in enumerate(samples):
                s = samples[method]
                label = f'ess={round(ess[method][0,d])}'
                axs[i,j].hist(s[:,::,d].T, histtype='step', linewidth=2, color=[f'C{k}' for _ in range(s.shape[0])], density=True, label=label)
                axs[i,j].legend()
            d += 1
        else:
            axs[i,j].axis('off')
        axs[i,j].set_yscale('log')
fig.subplots_adjust(wspace=.25, hspace=.4)
handles, _ = axs[0,0].get_legend_handles_labels()
fig.legend(handles, methods, loc='lower right', bbox_to_anchor=(.8, .15), ncols=len(methods))
plt.show()

The results shown were obtained on a 16-core AMD Ryzen Threadripper PRO 3955WX.
image
image
image

The main costs inflicted by the CHRR arise from the rounding transformation, which in this case is computed using PolyRound, optlang and Gurobi. Porting the rounding transformation to C++ may further improve performance.

Discussion

Regarding the final plot on the marginal flux distributions, it is not exactly clear to me what happens to the GLNabc flux, where OptGP seems to achieve very bad mixing (as measured by the ESS), which explains the bad performance of OptGP when considering the minimum ESS across all dimensions. In particular, it would be important to make sure which one of the samplers samples the correct distribution here. While I'm very confident that hopsy samples correctly in principle, it might be that I overlooked something in the problem setup. I would appreciate some double checking there. Also testing other problems than just the e_coli_core might be meaningful.

@cdiener
Copy link
Member

cdiener commented Jul 21, 2025

Sorry for the delay, currently on paternity leave. I will try to review in the next weeks.

I think it's a great addition and indeed something people have asked for a lot. Regarding the Glnac flux this is because in the other implementations fluxes with a bound difference smaller than the solver tolerance are considered fixed. So a flux with [-1e-12, 1e-12] would be fixed to zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CHRR method to sampling
2 participants