3rd Place Solution (Team cgu)
presentation
구정현, 이단영, 김상엽
- We adopted GeoGNN architecture (Fang et al., 2022) as a base molecule encoder.
- We revised GeoGNN to appropriately model the "difference" between two energy states of molecules.
- Training and test data can be downloaded from 2022 Samsung AI Challenge (Materials Discovery).
- We assume that you appropriately downloaded the dataset into 'data' directory.
python train.py configs/gem1.yaml
- You can train your model using default hyperparameters we used in this competition with above command.
- Trained model checkpoints and submission files (test_preds.csv) will be saved in 'outputs' directory. You can directly submit the csv file.
- Just a single model achieved high performance (private LB score: 6.5 ~ 7.0), but for further improvement, we used stacking ensemble.
bash run.sh
- Running above command will run 10-fold CV of total 12 models (4 different hyperparameters, 3 different seeds). This may take a long time.
- If it ran well, prediction files (csv) on validation set of each fold and test set would have been created.
- By running codes in 'stack_ensemble.ipynb', you can train xgboost models on stacked dataset and generate the ensembled submission file ('outputs/ensembled_submission.csv').