
Primary LanguageJupyter Notebook


Sen1Floods11: a georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1 (Example). This data was generated by Cloud to Street, a Public Benefit Corporation: https://www.cloudtostreet.info/. For questions about this dataset or code please email support@cloudtostreet.info. Please cite this data as:

Bonafilia, D., Tellman, B., Anderson, T., Issenberg, E. 2020. Sen1Floods11: a georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1. The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 210-211.

Available Open access at: http://openaccess.thecvf.com/content_CVPRW_2020/html/w11/Bonafilia_Sen1Floods11_A_Georeferenced_Dataset_to_Train_and_Test_Deep_Learning_CVPRW_2020_paper.html

Dataset Access

The dataset is available for access through Google Cloud Storage bucket at: gs://cnn_chips/

You can access the dataset bucket using the gsutil command. If you would like to download the entire dataset (~14 GB) you can use gsutil rsync to clone the bucket to a local directory. The -m flag is recommended to speed downloads. See the example below.

$ gsutil -m rsync -r gs://cnn_chips /YOUR/LOCAL/DIRECTORY/HERE

If using an example notebook, you can download the dataset to the folder that notebooks expect it to be in by running

$ mkdir /home/files3
$ gsutil -m rsync -r gs://cnn_chips /home/files3

Dataset Information

Each file follows the naming scheme EVENT_CHIPID_LAYER.tif (e.g. Bolivia_103757_S2.tif). Chip IDs are unique, and not shared between events. Events are named by country and further information on each event (including dates) can be found in the event metadata below. Each layer has a separate GeoTIFF, and can contain multiple bands in a stacked GeoTIFF. All images are projected to WGS 84 (EPSG:4326) at 10 m ground resolution.

Layer Description Values Format Bands
QC Hand labeled chips containing ground truth -1: No Data / Not Valid
0: Not Water
1: Water
512 x 512
1 band
0: QC
S1 Raw Sentinel-1 imagery.
IW mode, GRD product
See here for information on preprocessing
Unit: dB GeoTIFF
512 x 512
2 bands
0: VV
1: VH
S2 Raw Sentinel-2 MSI Level-1C imagery
Contains all spectral bands (1 - 12)
Does not contain QA mask
Unit: TOA reflectance
(scaled by 10000)
512 x 512
13 bands
0: B1 (Coastal)
1: B2 (Blue)
2: B3 (Green)
3: B4 (Red)
4: B5 (RedEdge-1)
5: B6 (RedEdge-2)
6: B7 (RedEdge-3)
7: B8 (NIR)
8: B8A (Narrow NIR)
9: B9 (Water Vapor)
10: B10 (Cirrus)
11: B11 (SWIR-1)
12: B12 (SWIR-2)

Example images

A sample of the dataset for chip Spain_7370579 is provided at in ./sample

Example Use

Main_Training_Stuff.ipynb runs shows how to go through the training loop with the dataset. Test_Models.ipynb runs shows how to go evaluate a model on the test sets.

Event Metadata

Locations of the flood events and metadata is contained in Sen1Floods11_Metadata.geojson. The following fields can be found:

Field Description
ID Unique ID for each event
location Flood event location (country)
ISO_CC ISO Country Code for flood event location
s1_date Date (YYYY-MM-dd) that Sentinel-1 image was acquired
s2_date Date (YYYY-MM-dd) that Sentinel-2 image was acquired
orbit Orbit (ASCENDING or DESCENDING) that Sentinel-1 image was acquired
rel_orbit_num Relative Orbit Number that Sentinel-1 image was acquired
coincident_size Number of coincident tiles from S2
VH_thresh Threshold used for Sentinel-1 VH band to classify water in reference S1 classification
train_chip Number of chips used for training
val_chip Number of chips used for validation