Algorithm crashes when starting from a single ancestor
Closed this issue · 6 comments
I am using default MolGA for molecular optimisation starting from a single molecule and offspring per cycle of 10. My scoring function works fine on a list of SMILES but the algorithm seems to crash when uniform_qualitle_sampling
passes an empty list of eligible_population
for sampling.
Are there any hyperparameters or tweaks to the algorithms one could employ to fix this?
Thanks for using MolGA @ruslankotl ! If uniform_qualitle_sampling
is asked to sample from an empty list then crashing is expected behaviour, but I'm not sure what exactly is causing an empty list to be passed. Can you provide a minimal example which reproduces the issue so that I can investigate (and determine whether it is a bug or a configuration issue)?
Thank you for your quick reply @AustinT ! I am using a custom scorer that takes a list of SMILES and returns a list of scores. The example is as follows:
import joblib
import csv
from scoring.score_dp5 import DP5
from scoring.transform import reverse_sigmoid
import random
from mol_ga import mol_libraries, default_ga
# Function to optimize: we choose QED.
# mol_ga is designed for batch functions so it inputs a list of SMILES and outputs a list of floats.
f_dp5 = DP5('cmae', nmr_file='/home/rk582/MARGARITA/S11/S11_NMR')
f_opt = lambda x: reverse_sigmoid(f_dp5(x), low=0, high=7, k=0.7)
# Starting molecules: we choose random molecules from ZINC
# (we provide an easy handle for this)
start_smiles = ['c1ccc(C2=NCCCCO2)cc1']
# Run GA with fast parallel generation
with joblib.Parallel(n_jobs=-1) as parallel:
ga_results = default_ga(
starting_population_smiles=start_smiles,
scoring_function=f_opt,
max_generations=100,
offspring_size=10,
parallel=parallel,
)
Hi @ruslankotl , I don't have access to your custom scoring
library and therefore I obviously cannot run your code and reproduce your issue...
I noticed that your code is based on the example script from the repository, so I tried running that with the same starting SMILES as you, but it never encountered any problem.
One possibility is that your scoring function is returning nan
s. If that is happening maybe the >=
check in this line always returns false. Could that be the issue?
Hi @AustinT
The function is indeed returning nan
. I have changed that behaviour so it returns 0
upon failure, and MolGA now throws no exceptions.
Great to hear! I will push a fix later to throw a more transparent error message though.