bjodah/chempy

Numerical instability when solving equilibrium problems

jlmaccal opened this issue · 2 comments

I'm working on a biological system with the following reactions:

1. 50 A <-> A_50
2. A_50 + B <-> A_50 B

I want to determine [A_50 B] as a function of [A]_initial.

I have setup a series of Equilibrium objects and defined an EqSystem. Everything works as expected, however the results are numerically unstable for some combinations of equilibrium constants and concentrations.

I think the issue is the massive difference between the equilibrium constants for the two reactions (K_1 = 1e233, k_2=1e6). For some combinations of initial concentrations, this produces bogus results and gives Too much of at least one component errors.

I'm not sure if it's possible to improve the numerical stability, but it would really help if something could be done.

Here is a minimal example that gives this problem:

import numpy as np
from chempy import Substance
from chempy import Equilibrium
from chempy.equilibria import EqSystem
from matplotlib import pyplot as pp
from collections import OrderedDict

Km = 1 / 1e-6
CMC = 30e-6
N = 50
K = CMC**-N / N
print(K)

substances = OrderedDict([
    ('P', Substance('P', composition={'protein': 1}, latex_name='[P]')),
    ('S', Substance('S', composition={'surfactant': 1}, latex_name='[S]')),
    ('S50', Substance('S50', composition={'surfactant': N}, latex_name='[S50]')),
    ('PS50', Substance('PS50', composition={'surfactant': N, 'protein': 1}, latex_name='[PS50]')),
])

eq1 = Equilibrium({'S': N}, {'S50': 1}, K)
eq2 = Equilibrium({'S50': 1, 'P': 1}, {'PS50': 1}, Km)
eqsys = EqSystem([eq1, eq2], substances=substances)

prot_added = np.logspace(-6, 0, 500, base=10)

def prot_to_concs(conc):
    init_conc = {'P': conc, 'S': 100e-6, 'S50': 0, 'PS50': 0, }
    x, sol, sane = eqsys.root(init_conc)
    return x

results = [prot_to_concs(c) for c in prot_added]
prot = np.array([r[0] for r in results])
surf = np.array([r[1] for r in results])
mic = np.array([r[2] for r in results])
prot_mic = np.array([r[3] for r in results])

pp.plot(prot_added, surf*1e6, label='[S]')
pp.plot(prot_added, mic*1e6, label='[M]')
pp.plot(prot_added, prot_mic*1e6, label='[PM]')
pp.xscale('log')
pp.legend();
pp.xlabel("Protein Acid Added (M)")
pp.ylabel("Concentration (µM)")
pp.show()

Hi Justin,

Interesting, I've never encountered an equilibrium constant with that extreme dimensionality.
Since I haven't worked with those kinds of problems I can't say anything for sure, but I agree with your analysis. Also, double precision floating points have an maximum representable value of ~10**308 which some of these numbers then come uncomfortably close to (from a logarithmic point of view of course).

Does there exist an alternative formulation? It sounds not all different from https://en.wikipedia.org/wiki/Smoluchowski_coagulation_equation

An other possible approach would be to apply a variable transformation and work in the logarithmic space, I've experimented some with this, see e.g.:
https://github.com/bjodah/chempy/search?q=numsyslog&type=Code

Thanks, the NumSysLog does seem to help, although some instability still crops up.

Yes, the equilibrium constant is enormous, which is basically due to the 50:1 stoichiometry of aggregate formation. In reality, my problem is actually a bit more complex and there are other coupled equilibria involved. Capturing all of this in a numerically stable model is proving challenging.

At this point, you can probably close this issue.