/datasets

Datasets for deep learning with satellite & aerial imagery

Datasets for deep learning applied to satellite and aerial imagery.

How to use this repository: if you know exactly what you are looking for (e.g. you have the paper name) you can Control+F to search for it in this page (or search in the raw markdown).

Lists of datasets

Remote sensing dataset hubs

Sentinel

As part of the EU Copernicus program, multiple Sentinel satellites are capturing imagery -> see wikipedia

Landsat

Long running US program -> see Wikipedia

VENμS

Vegetation and Environment monitoring on a New Micro-Satellite (VENμS)

Maxar

Satellites owned by Maxar (formerly DigitalGlobe) include GeoEye-1, WorldView-2, 3 & 4

  • Maxar Open Data Program provides pre and post-event high-resolution satellite imagery in support of emergency planning, response, damage assessment, and recovery
  • WorldView-2 European Cities -> dataset covering the most populated areas in Europe at 40 cm resolution

Planet

UC Merced

Land use classification dataset with 21 classes and 100 RGB TIFF images for each class. Each image measures 256x256 pixels with a pixel resolution of 1 foot

EuroSAT

Land use classification dataset of Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples. Available in RGB and 13 band versions

PatternNet

Land use classification dataset with 38 classes and 800 RGB JPG images for each class

Gaofen Image Dataset (GID) for classification

Million-AID

A large-scale benchmark dataset containing million instances for RS scene classification, 51 scene categories organized by the hierarchical category

DIOR object detection dataset

A large-scale benchmark dataset for object detection in optical remote sensing images, which consists of 23,463 images and 192,518 object instances annotated with horizontal bounding boxes

Multiscene

MultiScene dataset aims at two tasks: Developing algorithms for multi-scene recognition & Network learning with noisy labels

FAIR1M object detection dataset

A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery

  • arxiv papr
  • Download at gaofen-challenge.com
  • 2020Gaofen -> 2020 Gaofen Challenge data, baselines, and metrics

DOTA object detection dataset

A Large-Scale Benchmark and Challenges for Object Detection in Aerial Images. Segmentation annotations available in iSAID dataset

iSAID instance segmentation dataset

A Large-scale Dataset for Instance Segmentation in Aerial Images

HRSC RGB ship object detection dataset

SAR Ship Detection Dataset (SSDD)

High-Resolution SAR Rotation Ship Detection Dataset (SRSDD)

LEVIR ship dataset

A dataset for tiny ship detection under medium-resolution remote sensing images. Annotations in bounding box format

SAR Aircraft Detection Dataset

2966 non-overlapped 224×224 slices are collected with 7835 aircraft targets

xView1: Objects in context for overhead imagery

A fine-grained object detection dataset with 60 object classes along an ontology of 8 class types. Over 1,000,000 objects across over 1,400 km^2 of 0.3m resolution imagery. Annotations in bounding box format

xView2: xBD building damage assessment

Annotated high-resolution satellite imagery for building damage assessment, precise segmentation masks and damage labels on a four-level spectrum, 0.3m resolution imagery

xView3: Detecting dark vessels in SAR

Detecting dark vessels engaged in illegal, unreported, and unregulated (IUU) fishing activities on synthetic aperture radar (SAR) imagery. With human and algorithm annotated instances of vessels and fixed infrastructure across 43,200,000 km^2 of Sentinel-1 imagery, this multi-modal dataset enables algorithms to detect and classify dark vessels

Vehicle Detection in Aerial Imagery (VEDAI)

Vehicle Detection in Aerial Imagery. Bounding box annotations

Cars Overhead With Context (COWC)

Large set of annotated cars from overhead. Established baseline for object detection and counting tasks. Annotations in bounding box format

AI-TOD & AI-TOD-v2 - tiny object detection

The mean size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. Annotations in bounding box format. V2 is a meticulous relabelling of the v1 dataset

RarePlanes

Counting from Sky

A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method

AIRS (Aerial Imagery for Roof Segmentation)

Public dataset for roof segmentation from very-high-resolution aerial imagery (7.5cm). Covers almost the full area of Christchurch, the largest city in the South Island of New Zealand.

Inria building/not building segmentation dataset

RGB GeoTIFF at spatial resolution of 0.3 m. Data covering Austin, Chicago, Kitsap County, Western & Easter Tyrol, Innsbruck, San Francisco & Vienna

AICrowd Mapping Challenge: building segmentation dataset

300x300 pixel RGB images with annotations in COCO format. Imagery appears to be global but with significant fraction from North America

  • Dataset release as part of the mapping-challenge
  • Winning solution published by neptune.ai here, achieved precision 0.943 and recall 0.954 using Unet with Resnet.
  • mappingchallenge -> YOLOv5 applied to the AICrowd Mapping Challenge dataset

BONAI - building footprint dataset

BONAI (Buildings in Off-Nadir Aerial Images) is a dataset for building footprint extraction (BFE) in off-nadir aerial images

LEVIR-CD building change detection dataset

Onera (OSCD) Sentinel-2 change detection dataset

It comprises 24 pairs of multispectral images taken from the Sentinel-2 satellites between 2015 and 2018.

SECOND - semantic change detection

Amazon and Atlantic Forest dataset

For semantic segmentation with Sentinel 2

Functional Map of the World ( fMoW)

  • https://github.com/fMoW/dataset
  • RGB & multispectral variants
  • High resolution, chip classification dataset
  • Purpose: predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features

HRSCD change detection

MiniFrance-DFC22 - semi-supervised semantic segmentation

FLAIR

Semantic segmentation and domain adaptation challenge proposed by the French National Institute of Geographical and Forest Information (IGN). Uses a dataset composed of over 70,000 aerial imagery patches with pixel-based annotations and 50,000 Sentinel-2 satellite acquisitions.

ISPRS

Semantic segmentation dataset. 38 patches of 6000x6000 pixels, each consisting of a true orthophoto (TOP) extracted from a larger TOP mosaic, and a DSM. Resolution 5 cm

SpaceNet

SpaceNet is a series of competitions with datasets and utilities provided. The challenges covered are: (1 & 2) building segmentation, (3) road segmentation, (4) off-nadir buildings, (5) road network extraction, (6) multi-senor mapping, (7) multi-temporal urban change, (8) Flood Detection Challenge Using Multiclass Segmentation

WorldStrat Dataset

Nearly 10,000 km² of free high-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities.

Satlas Pretrain

SatlasPretrain is a large-scale pre-training dataset for tasks that involve understanding satellite images. Regularly-updated satellite data is publicly available for much of the Earth through sources such as Sentinel-2 and NAIP, and can inform numerous applications from tackling illegal deforestation to monitoring marine infrastructure.

FLAIR 1 & 2 Segmentation datasets

  • https://ignf.github.io/FLAIR/
  • The FLAIR #1 semantic segmentation dataset consists of 77,412 high resolution patches (512x512 at 0.2 m spatial resolution) with 19 semantic classes
  • FLAIR #2 includes an expanded dataset of Sentinel-2 time series for multi-modal semantic segmentation

Five Billion Pixels segmentation dataset

RF100 object detection benchmark

RF100 is compiled from 100 real world datasets that straddle a range of domains. The aim is that performance evaluation on this dataset will enable a more nuanced guide of how a model will perform in different domains. Contains 10k aerial images

SODA-A rotated bounding boxes

EarthView from Satellogic

Microsoft datasets

Google datasets

Google Earth Engine (GEE)

Since there is a whole community around GEE I will not reproduce it here but list very select references. Get started at https://developers.google.com/earth-engine/

Image captioning datasets

  • RSICD -> 10921 images with five sentences descriptions per image. Used in Fine tuning CLIP with Remote Sensing (Satellite) images and captions, models at this repo
  • RSICC -> the Remote Sensing Image Change Captioning dataset contains 10077 pairs of bi-temporal remote sensing images and 50385 sentences describing the differences between images. Uses LEVIR-CD imagery
  • ChatEarthNet -> A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models, utilizes Sentinel-2 data with captions generated by ChatGPT

Weather Datasets

Cloud datasets

Forest datasets

Geospatial datasets

  • Resource Watch provides a wide range of geospatial datasets and a UI to visualise them

Time series & change detection datasets

  • BreizhCrops -> A Time Series Dataset for Crop Type Mapping
  • The SeCo dataset contains image patches from Sentinel-2 tiles captured at different timestamps at each geographical location. Download SeCo here
  • SYSU-CD -> The dataset contains 20000 pairs of 0.5-m aerial images of size 256×256 taken between the years 2007 and 2014 in Hong Kong

DEM (digital elevation maps)

  • Shuttle Radar Topography Mission, search online at usgs.gov
  • Copernicus Digital Elevation Model (DEM) on S3, represents the surface of the Earth including buildings, infrastructure and vegetation. Data is provided as Cloud Optimized GeoTIFFs. link
  • Awesome-DEM

UAV & Drone datasets

Other datasets

Kaggle

Kaggle hosts over > 200 satellite image datasets, search results here. The kaggle blog is an interesting read.

Kaggle - Amazon from space - classification challenge

Kaggle - DSTL segmentation challenge

Kaggle - DeepSat land cover classification

Kaggle - Airbus ship detection challenge

Kaggle - Shipsnet classification dataset

Kaggle - Ships in Google Earth

Kaggle - Ships in San Franciso Bay

Kaggle - Swimming pool and car detection using satellite imagery

Kaggle - Planesnet classification dataset

Kaggle - CGI Planes in Satellite Imagery w/ BBoxes

Kaggle - Draper challenge to place images in order of time

Kaggle - Dubai segmentation

Kaggle - Massachusetts Roads & Buildings Datasets - segmentation

Kaggle - Deepsat classification challenge

Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat Matlab format. JPEG?

  • Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three
  • Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.

Kaggle - High resolution ship collections 2016 (HRSC2016)

Kaggle - SWIM-Ship Wake Imagery Mass

Kaggle - Understanding Clouds from Satellite Images

In this challenge, you will build a model to classify cloud organization patterns from satellite images.

Kaggle - 38-Cloud Cloud Segmentation

Kaggle - Airbus Aircraft Detection Dataset

Kaggle - Airbus oil storage detection dataset

Kaggle - Satellite images of hurricane damage

Kaggle - Austin Zoning Satellite Images

Kaggle - Statoil/C-CORE Iceberg Classifier Challenge

Classify the target in a SAR image chip as either a ship or an iceberg. The dataset for the competition included 5000 images extracted from multichannel SAR data collected by the Sentinel-1 satellite. Top entries used ensembles to boost prediction accuracy from about 92% to 97%.

Kaggle - Land Cover Classification Dataset from DeepGlobe Challenge - segmentation

Kaggle - Next Day Wildfire Spread

A Data Set to Predict Wildfire Spreading from Remote-Sensing Data

Kaggle - Satellite Next Day Wildfire Spread

Inspired by the above dataset, using different data sources

Kaggle - Spacenet 7 Multi-Temporal Urban Change Detection

Kaggle - Satellite Images to predict poverty in Africa

Kaggle - NOAA Fisheries Steller Sea Lion Population Count

Kaggle - Arctic Sea Ice Image Masking

Kaggle - Overhead-MNIST

Kaggle - Satellite Image Classification

Kaggle - EuroSAT - Sentinel-2 Dataset

Kaggle - Satellite Images of Water Bodies

Kaggle - NOAA sea lion count

Kaggle - miscellaneous

Competitions

Competitions are an excellent source for accessing clean, ready-to-use satellite datasets and model benchmarks.