CTCoreNet
Neural network for classifying rock clasts in computed tomography (CT) scans of sediment cores.
Getting started
Quickstart
Launch in Pangeo Binder (Interactive jupyter lab environment in the cloud).
Usage
Training data preparation
Images are manually labeled by drawing polygons around objects of interest, e.g. ice rafted debris (IRD) rock clasts. The polygons are drawn using the labelme GUI program which is launched from the command-line using:
labelme
Follow the single image annotation tutorial at https://github.com/wkentaro/labelme/tree/master/examples/tutorial to draw the polygons. The polygons are stored in a JSON file, and can be converted to image masks using:
labelme_json_to_dataset data/CT_data/CORE_42.json -o data/train/CORE_42
This will produce a folder named CORE_42 containing 4 files:
- img.png
- label.png
- label_names.txt
- label_viz.png
Running the neural network
The model can be trained using the following command:
python ctcorenet/ctcoreunet.py
This will load the image data stored in data/train
, perform the training
(minimize loss between img.png and label.png), and produce some outputs.
More advanced users can customize the training, e.g. to be more deterministic, running for only x epochs, train on a GPU using 16-bit precision, etc, like so:
python ctcorenet/ctcoreunet.py --deterministic=True --max_epochs=3 --gpus=1 --precision=16
More options to customize the training can be found by running
python ctcorenet/ctcoreunet.py --help
.
Reproducing the entire pipeline
To easily manage the whole machine learning workflow, this project uses the data version control (DVC) library which stores all the commands and input/intermediate/output data assets used. This makes it easy to reproduce the entire pipeline using a single command
dvc repro
This command will perform all the data preparation and model training steps. For more information, see https://dvc.org/doc/start/data-pipelines.