AIM: This project aims to track changes in water level using satellite imagery and deep learning.
Table of Content:
- Introduction
- Datasets
- Labeling
- Data Augmentation
- Metrics
- U-Net architecture
- Model Optimization
- Model Results
- Dashboard
- Technical Stack
The motivation for this project is the article Some of the World's Biggest Lakes Are Drying Up found in the March 2018 edition of the National Geographic magazine.
Freshwater is the most important resource for mankind, cross-cutting all social, economic and environmental activities. It is a condition for all life on our planet, an enabling limiting factor for any social and technological development, a possible source of welfare or misery, cooperation or conflict.
The exponenetial growth of satellite-based information over the past four decades has provided unprecedented opportunities to improve water resource manegement.
NWPU-Resic-45 is a publicly available data set.This dataset contains 31,500 images, covering 45 scene classes (including water classes) with 700 images in each class.
The second dataset is a time-series of cloudless Sentinel-2 imagery including 17 criticaly endangered lakes as following:
- Lake Poopo, Bolivia;
- Lake Urmia, Iran;
- Lake Mojave, USA;
- Aral sea, Kazahkstan;
- Lake Copais, Greece;
- Lake Ramganga, India;
- Qinghai Lake, China;
- Salton Sea, USA;
- Lake Faguibine, Mali;
- Mono Lake, USA;
- Walker Lake, USA;
- Lake Balaton, Hungary;
- Lake Koroneia, Greece;
- Lake Salda, Turkey;
- Lake Burdur, Turkey;
- Lake Mendocino, USA;
- Elephant Butte Reservoir, USA.
The MakeSense online tool has been used for labeling both datasets images. It only requires a web browser and you are ready to go. It's an excellent choice for small computer vision deep learning projects, making the process of preparing the dataset easier and faster.
The following techniques have been applied during training:
- Height shift up to 30%;
- Horizontal flip;
- Rotation up to 45 degrees;
- No shear;
- Vertical flip;
- Width shift up to 30%;
- Zoom between 75% and 125%.
The following metrics have been used to evaluate the semantic segmenation model:
- Jaccard Index
- Dice Coefficient
More information about both of these metrics can be found here.
We used a simple U-Net model architecture. This strategy allow us to modify the model for our own purposes and fine-tunning it as necessary for our development purposes. By using this network architecture, we could spend more time understanding the optimization strategies.
Train/Validation/Test splits based on Resic-45 dataset only:
- training set: 489 images;
- validation set: 140 images;
- test set: 71 images.
Model performance:
Train/Validation/Test splits based on Resic-45 dataset only:
- training set: 979 images;
- validation set: 280 images;
- test set: 122 images.
Model performance:
It can be seen clearly that the baseline model overfits using image augmentation.
The following strategies have been explored:
- Using Early Stopping and adaptive learning rates;
- Using a bigger model (and dropout);
- Using regularization (Batch Normalization);
- Using residual connections;
- Dealing with class imbalance using dice loss;
- Refining label images using CRFs;
- Ensemble predictions.
Train/Validation/Test splits:
- training set: 489 images from Resic-45 dataset randomly transformed at each epoch using one of the techniques described in the fourth section Data Augmentation;
- validation set: 211 images from Resic-45 dataset;
- test set: 359 images from Sentinel-2 dataset.
Model performance using binary cross entropy as the loss function:
Model performance using dice loss as the loss function:
The test set to measure the results presented below is based on 182 images from Sentinel-2 dataset.
Model 1: U-Net residual model trained without label correction:
Model 2: U-Net residual model trained with label correction using Conditional Random Fields:
Model 3: Ensemble model based on the two previous models:
The ensemble model is the one with highest accuracy (97.15%) and is the one used in the Dashboard application that will be covered in the next section.
The dashboard can be executed with the following command:
python app.py
A demo is available here.
The following libraries are required to create the virtual environment. The creation of the virtual environment is detailed in the next section.
- Cython
- Dash
- Matplotlib
- NumPy
- Pillow
- Plotly
- Pydensecrf
- Rasterio
- Requests
- Tensorflow 2.4