A white matter hyperintensity segmentation tool based on sysu_media, the winnner of the MICCAI 2017 WMH challenge. A range of updates to functionailty and some tweaks in methodolgy have been performed to improve the method.
The code has been tested using python 3.6.4 but should work for most python 3.6+ versions. The main dependencies of this method are, Keras 2.3.1, TensorFlow 1.14 and SimpleITK 2.0.1 among others. Installation of the dependencies should be as simple as
pip3 install -r requirements.txt
The body of the method is in the wmh folder, which in turn contains the weights folder which contains the weights of the pre-trained networks.
The main scripts for using the code are: - eval.py which evaluates a pre-trained network on a given dataset. This is the script to use for prediction - train.py which trains a network either from scratch, or given starting weights - random_test_train_split.py which, given a directory full of subjects, will split them into training and testing sets and produce csv files listing subjects in each group - generate_hdf5.py which takes a list of subjects (either csv from random_test_train_split.py or pointed to directory) and loads in images, preprocesses and saves into a .h5 file ready for training.
An example code to evaluate model given a .csv file from random_test_train_split
python3 ./eval.py --csv_file ./subjects_ADNI_train.csv --num_unet 3 --compute_metrics --T1_name mri/brain-in-rawavg_mgz2nii.nii.gz --FLAIR_name flair.nii.gz --gt_name mri/wmh.nii.gz
Full usage
usage: eval.py [-h] [--data_dir DATA_DIR] [--csv_file CSV_FILE]
[--pattern PATTERN] [--T1_name T1_NAME]
[--FLAIR_name FLAIR_NAME] [--gt_name GT_NAME]
[--output_name OUTPUT_NAME] [--rows_standard ROWS_STANDARD]
[--cols_standard COLS_STANDARD] [--batch_size N] [--verbose]
[--model_dir MODEL_DIR] [--FLAIR_only] [--num_unet NUM_UNET]
[--num_unet_start NUM_UNET_START] [--ignore_frac IGNORE_FRAC]
[--compute_metrics] [--model_suffix MODEL_SUFFIX]
WMH training
optional arguments:
-h, --help show this help message and exit
--data_dir DATA_DIR Directory containing data. Will be overrided by
--csv_file is supplied
--csv_file CSV_FILE Csv-file listing subjects to include in file
--pattern PATTERN Pattern to match files in directory.
--T1_name T1_NAME Default name of T1 images. (default
T1/T1_brain.nii.gz)
--FLAIR_name FLAIR_NAME
Default name of T2FLAIR images. (default
T2_FLAIR/T2_FLAIR)
--gt_name GT_NAME Default name for ground truth segmentations (default
T2_FLAIR/lesions/final_mask.nii.gz)
--output_name OUTPUT_NAME
Name of ouput segmentation file. (default
wmh_seg.nii.gz)
--rows_standard ROWS_STANDARD
Height of input to network (Default 200)
--cols_standard COLS_STANDARD
Width of input to network (Default 200)
--batch_size N input batch size for training (default: 30)
--verbose, -v Flag to use verbose training. A single flag will cause
full verbosity. Double flag (e.g. -vv) will cause less
verbosity (use -vv in non-interactive environments
like cluster)
--model_dir MODEL_DIR
path to store model weights to (also path containing
starting weights for --resume) (default:
./wmh/weights)
--FLAIR_only Flag whether to just use FLAIR (default (if flag not
provided): use FLAIR and T1)
--num_unet NUM_UNET Number of networks to train (default: 1)
--num_unet_start NUM_UNET_START
Number from which to start training networks (i.e.
start from network 1 if network 0 is done) (default:
0)
--ignore_frac IGNORE_FRAC
Fraction of slices from top and bottome to ignore
(default: 0.125)
--compute_metrics Flag whether to compute metrics after segmentation
(requires ground truth)
--model_suffix MODEL_SUFFIX
Suffix to model name so model will save as
{num_net}_{suffix}.h5 (default: None)
For training, first a dataset must be prepared using generate_hdf5.py and/or random_test_train_split.py. For example, this is how one could do all the steps necesary to train a model from scratch on a dataset:
> python3 ./random_test_train_split.py --data_dir ../ADNI_data --csv_name ./subjects_ADNI
> python3 ./generate_hdf5.py --csv_file ./subjects_ADNI_train.csv --hdf5_name ADNI_train_data.hdf5 --T1_name mri/T1.nii.gz --FLAIR_name flair.nii.gz --gt_name mri/wmh.nii.gz
> python3 ./train.py --batch_size 32 --verbose --hdf5_name_train ADNI_train_data.hdf5
Full usage for train.py
usage: train.py [-h] [--hdf5_name_train HDF5_NAME_TRAIN]
[--hdf5_name_test HDF5_NAME_TEST]
[--validation_split VALIDATION_SPLIT] [--batch_size N]
[--validation_batch_size N] [--epochs EPOCHS] [--verbose]
[--early_stopping] [--es_patience ES_PATIENCE]
[--es_metric {loss,dice,dsc,jaccard,tversky,focal-tversky}]
[--log_dir LOG_DIR] [--csv_log] [--tb_log]
[--model_dir MODEL_DIR]
[--loss {dice,jaccard,dsc,tversky,focal-tversky}]
[--metrics [{dice,jaccard,dsc,tversky,focal-tversky} [{dice,jaccard,dsc,tversky,focal-tversky} ...]]]
[--resume] [--FLAIR_only] [--no_aug] [--aug_factor AUG_FACTOR]
[--aug_theta AUG_THETA] [--aug_shear AUG_SHEAR]
[--aug_scale AUG_SCALE] [--num_unet NUM_UNET]
[--num_unet_start NUM_UNET_START] [--test_ensemble] [--lr LR]
[--output_test_aug] [--no_shuffle]
WMH training
optional arguments:
-h, --help show this help message and exit
--hdf5_name_train HDF5_NAME_TRAIN
path and name of hdf5-dataset for training (default:
test_train.hdf5)
--hdf5_name_test HDF5_NAME_TEST
path and name of hdf5-dataset for testing (default:
test_test.hdf5)
--validation_split VALIDATION_SPLIT
Fraction of data for validation. Will be overridden by
hdf5_name_test for explicit validation set. (default:
0.2)
--batch_size N input batch size for training (default: 30)
--validation_batch_size N
input batch size for validation (default: 30)
--epochs EPOCHS Number of epochs (default: 50)
--verbose, -v Flag to use verbose training output. -v will have
progress bar per epoch, -vv will print one line per
epoch (use this in non-interactive runs e.g. cluster)
--early_stopping Flag to use early stopping
--es_patience ES_PATIENCE
No. epochs over which to use patience in early
stopping (default: 20)
--es_metric {loss,dice,dsc,jaccard,tversky,focal-tversky}
Choice of early stopping monitoring metric (default:
loss)
--log_dir LOG_DIR Log directory for logging of training performance.
Requires --csv_log to be provided for logging
(default: None)
--csv_log Flag to store csv log
--tb_log Flag to store tensor board log
--model_dir MODEL_DIR
path to store model weights to (also path containing
starting weights for --resume) (default:
./wmh/weights)
--loss {dice,jaccard,dsc,tversky,focal-tversky}
Choice of loss function (default: dice)
--metrics [{dice,jaccard,dsc,tversky,focal-tversky} [{dice,jaccard,dsc,tversky,focal-tversky} ...]]
Choice of metric functions (default: None)
--resume Flag to resume training from checkpoints.
--FLAIR_only Flag whether to just use FLAIR (default (if flag not
provided): use FLAIR and T1)
--no_aug Flag to not do any augmentation
--aug_factor AUG_FACTOR
Factor by which to increase dataset by using
augmentation. i.e. the dataset will be x times bigger
after augmentation (default: 1 (results in twice as
big a dataset))
--aug_theta AUG_THETA
Degree of rotation to use in augmentation [degrees]
(default: 15)
--aug_shear AUG_SHEAR
Shear factor in augmentation (default: 0.1)
--aug_scale AUG_SCALE
Scaling factor in augmentation (default: 0.1)
--num_unet NUM_UNET Number of networks to train (default: 1)
--num_unet_start NUM_UNET_START
Number from which to start training networks (i.e.
start from network 1 if network 0 is done) (default:
0)
--test_ensemble Flag to test the overall ensemble performance once all
networks are trained
--lr LR Learning rate (default: 2e-4)
--output_test_aug Flag to save 10 test images from augmentation
generator
--no_shuffle Flag to not shuffle the slices during training
(default is to shuffle)
This segmentation code is based on the sysu_media code published in NeuroImage. Please cite our work if you find the codeis useful for your research.