-
We have tested our code in an environment with
-
Python 3.8.13
-
OS
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core
- CUDA
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
To replicate one of our experiments, it should be possible to call source ./setup.sh
to prepare the environment and finally source ./replicate.sh
to run the experiment. To replicate the experiment selected as final on kaggle, select to include the validation set in the training data when running setup.sh. However please note that these scripts have not undergone extensive testing, so read the rest of the README and if needed, follow the appropriate steps manually. Also, both of these scripts need internet access. If they are run once without internet access, it can cause problems for other runs, you may need to start fresh from the zip again.
The configuration files of the final experiment are located in experiment_configs/final
. The main config to be used is config.yaml
. The experiments can be run by python3 run_experiment.py --config config.yaml
. e.g. python3 run_experiment.py --config experiment_configs/exp_04a2_resnet50_gdl.yaml
, for experimenting with Resnet50 with Generlaized Dice Loss.
A checkpoint called globe.ckpt
is needed. This is the checkpoint after training ConvNext on DeepGlobe data and can be downloaded from https://polybox.ethz.ch/index.php/s/1fAbWrYUuf3oLWP.
Furthermore, to run any experiments, the path to the data needs to be specified in the config under the name data_root_dir
, which is set to data/
by default. This directory is expected to contain the following directory structure:
- split
- train
- groundtruth
- images
- test
- groundtruth
- images
- train
- test
- images
The model is trained on the images from split/train
, split/test
is used as a validation set. We used a validation set consisting of the following images:
- satimage_107.png
- satimage_126.png
- satimage_129.png
- satimage_131.png
- satimage_133.png
- satimage_138.png
- satimage_139.png
- satimage_21.png
- satimage_22.png
- satimage_29.png
- satimage_30.png
- satimage_32.png
- satimage_35.png
- satimage_37.png
- satimage_45.png
- satimage_54.png
- satimage_57.png
- satimage_60.png
- satimage_63.png
- satimage_65.png
- satimage_69.png
- satimage_6.png
- satimage_70.png
- satimage_83.png
- satimage_84.png
- satimage_85.png
- satimage_89.png
- satimage_94.png
- satimage_95.png
For the final experiment, we included the validation set in the training data (by copying the aforementioned images to split/train
). We left copies of these images in split/test
, consequently the validation numbers were generated but unreliable. The behavior of the pipeline in case the images were deleted from split/test
is unknown to us.
The required packages are listed in requirements.txt
and can be installed using pip3 install -r requirements.txt
.
- The script
run_experiment.py
creates anExperimentPipelineForSegmentation
or anEnsemblePipeline
if the experiment uses an ensemble model as indicated by theensemble
field in the configuration. - Then, it calls methods
prepare_experiment
andrun_experiment
on the pipeline. prepare_experiment
performs the following steps:- Initialize the model to be trained
- Create the optimizer and schedulers
- Prepare the data loader for training
- Create the loss function and evaluation metrics
- Create output folder in the directory indicated by the
logdir
attribute of the config (by defaultruns
), and the tensorboard summary writer - Prepare batch and epoch callbacks
- The
run_experiment
method in the pipeline executes training and creates a submission file with segmentations of the test images, using functions from themask_to_submission.py
script provided with the project, located in thesubmit
directory.
- Class
VanillaDataLoaderUtil
returns training, validation, and test data loaders for the training pipeline - Data loaders are created from dataset class
CILRoadSegmentationDataset
- Default configurations load the dataset from the
data
directory. This can be modified through thedata_root_dir
attribute in the config. CILRoadSegmentationDataset
performs several transforms such as normalizing the images and (for training data) applies rotation, and horizontal and vertical flips as data augmentation, implemented in classesSegRotationTransform
,SegHorizontalFlip
, andSegVerticalFlip
.
- Several architectures can be loaded for training from
models/model_factory.py
- The class of the model to be trained can be indicated in
model_class_name
attribute of the config (e.g.PrunedConvnextSmall
).
- Several cost functions are available in
training/cost_functions.py
as losses for training the segmentation models. - The class of the loss to be used can be indicated in
cost_function_class_name
attribute of the config (e.g.BinaryGeneralizeDiceLoss
).
We used python3 3.8.13+ during development.
The configuration of the experiment with regular (non-generalized) Dice loss (also reaching 93.3%) can be found in experiment_configs/old_dice_final
. Note that when ensembling is requested in the config, the configs to be ensembled are looked for on the path relative to the current working directory (not the original config directory). The same holds for the path to the data directory. Consequently, the working directory matters and for some configs it might be required to move some files. We took steps to ensure no file movement is needed for replicating the final
and old_dice_final
experiments provided the working directory is the root of the repository.
Please note that the Dice loss which is referred from the report is DiceLossV2 from this repository and if ensembles are used, the validation scores are also generated but unreliable.
Moreover, on the first run, ConvNext weights need to be downloaded, so please make sure the script has internet access, otherwise you may encounter problems even if you provide internet access later. Also, a number of checkpoints is saved, which takes up quite a bit of storage under the runs
directory. If needed, it should help to change the save frequency in the config.