DrugANNs

Global AI Challenge solution Overall pipeline:

data preprocessing (removed unneeded parts of molecules)
generated Morgan, MACCS and Estate fingerprints
applied MolCLR graph neurla network
applied RandomForest to the features described before
the models' results were merged and averaged
the results from the previous point were also passed to the Lipinski rule checker

Repository structure

Run data_preprocessing.ipynb to make canonical SMILES
Run ogb-rdk-transform.ipynb to get preprocessed dataset
Go to YouGraphRF and run python random_forest.py --smiles_file ... --smiles_test_file ...
Take predictions from rf_preds/rf_final_pred.npy
Go to MolCLR
Place preprocessed molecules data to data/covid/COVID.csv and data/covid/COVID-test.csv for train and test subsets correspondingly.
Run python finetune_contrast.py
Finally, run predict-molclr.ipynb. You need to change model path with your checkpoint. Or you can find checkpoint used for submission in finetune folder
The final predictions should be passed to lipinski_rule_application.ipynb

You can find the requirements in requirements.txt file