Our solution consists of 6 weighted ensembles: one ensemble per target for molecules with shared (BRD4_shared, HSA_shared, sEH_shared) and non-shared (BRD4_nonshared, HSA_nonshared, sEH_nonshared) building blocks. Links to the training scripts for all models are grouped in 4 folders according to model type: GBDT, chemberta, CNN, GNN.
Please be lenient if it happens so some code will need additional manual tweaking to make it work. Models were trained on different environments including bunch of local PCs, kaggle and rented servers on vast.ai (the last one with default CUDA 12.4 pytorch image).
Initial data preparation should be done by sequentially running prepare_data/make_train_test_split.R and prepare_data/replace_dy.py. Please refer to GBDT/README.md for guidance on reproducing the solution's GBDT models.