This repository contains the modelling code for the HABNet Project [HABNet: Machine Learning, Remote Sensing Based Detection and Prediction of Harmful Algal Blooms]((
HABNet datacube extraction code available here.
This code is for generating classification scores for HAB databases
There are two basic classification methods:
- Extract features from each frame with a ConvNet, passing the sequence to an RNN, in a separate network
- Extract features from each frame with a ConvNet and pass the sequence to an MLP/LSTM/RF system
This code requires you have Keras 2 and TensorFlow 1 or greater installed. Please see the requirements.txt
file. To ensure you're up to date, run:
pip install -r requirements.txt
The data is extracted using a MATLAB script and deposited into the CNNIms directory (one png per time stamp).
For the five models ('lstm0', lstm1
, 'lstm2', 'mlp1', mlp2
, 'RF') features are firstly extracted from each png image using the
Keras based models model = lstm0: BEST PERFORMING lstm (Batch normalisation and dropout) model = lstm1: Dropout model = lstm2: Batch Normalisation model = mlp1: Batch Normalisation model = mlp2: Dropout
These models save the model files in [seqName + model] H5 file
Non Keras based models model = svm: Support Vector Machines (currently not working) model = xbg: State of the art boosting method model = RF: Random Forest
The actual training (and testing is done using the python file)
The training is controlled using the input configuration xml file (e.g. ./cnfgXMLs/NASNet11_lstm0.xml) The elements within the configuration file control the training process. A typical config file is shown below.
inDir: The directory where the png images are stored
dataDir: The directory where all the data will be output in the training process
seqName: CNN output name (numpy file format)
featureLength: Length of the features output from the CNN
SVDFeatLen: Feature length reduction from first stage (-1) to keep the same size [currently not implemented]
model: name of model topology: one of mlp1, mlp2, lstm1, lstm2, RF
seqLength: Number of png images in each modality (temporal span)
batchSize: tensorflow control
epochNumber: tensorflow control
The actual training (and testing is done using the python file)
- This file contains a script that inputs an XML configuration file (containing the definition of the trained models) and the directory containing a directory structure of the bottleneck features extracted from the datacube images by the defined CNNs. The classifications and probabilities for all datacubes are stored in text file stored in the date based directory (inDir)..classesProbs.txt All the directory names are extracted from the test directories in order to correlate with the lat and lon data in 'latLonList.txt' (generated by test_cubeSequence.m
- This script demonstrates how to generate a list of classifications (and probabilities) from an extracted grid of lat,lon and dates (output to text file) [This is best place to start to run demos of how to generate test classifications]
- However, the script shows how to chain all the necessary parts together to get a detection from a lat, lon and date i.e. generate datacube, generate images from datacube, extract features from images and then produce a classification from the features [Not currently supported, but a good demonstration]
genSingleH5sWrapper.m: Top level wrapper code to generate H5 file from lat, lon and date (inputs all config info and then calls genSingleH5s)
outputImagesFromDataCubeScript.m: Load all configs and call outputImagesFromDataCube.m (take H5 datacube and then output images for feature extraction / classification
- debug and test for single HAB events
- Create "whole model" with one CNN for each modality (fine tuned)
- Try removing more layers of the CNNs
- Mend the SVM bottom layer
- Evaluate the classification performance of each "features"
- Implment SVD properly