Introduction
This document lists resources for performing deep learning (DL) on satellite imagery. To a lesser extent classical Machine learning (ML, e.g. random forests) are also discussed, as are classical image processing techniques.
Table of contents
- Top links
- Datasets
- Interesting deep learning projects
- Techniques
- Image formats and catalogues
- State of the art
- Online platforms for Geo analysis
- Free online computing resources
- Production
- Useful open source software
- Movers and shakers on Github
- Courses
- Online communities
- Companies
- Jobs
- Neural nets in space
- About the author
Top links
- awesome-satellite-imagery-datasets
- awesome-earthobservation-code
- awesome-sentinel
- A modern geospatial workflow
- geospatial-machine-learning
- Long list of satellite missions with example imagery
- AWS datasets
Datasets
- Warning satellite image files can be LARGE, even a small data set may comprise 50 GB of imagery
- Various datasets listed here and at awesome-satellite-imagery-datasets
WorldView
- A commercial satellite owned by DigitalGlobe
- https://en.wikipedia.org/wiki/WorldView-3
- 0.3m PAN, 1.24 MS, 3.7m SWIR. Off-Nadir (stereo) available.
- Owned by DigitalGlobe
- Getting Started with SpaceNet
- Dataset on AWS -> see this getting started notebook and this notebook on the off-Nadir dataset
- cloud_optimized_geotif here used in the 3D modelling notebook here.
- Package of utilities to assist working with the SpaceNet dataset.
- WorldView cloud optimized geotiffs used in the 3D modelling notebook here.
- For more Worldview imagery see Kaggle DSTL competition.
Sentinel
- As part of the EU Copernicus program, multiple Sentinel satellites are capturing imagery -> see wikipedia.
- 13 bands, Spatial resolution of 10 m, 20 m and 60 m, 290 km swath, the temporal resolution is 5 days
- awesome-sentinel - a curated list of awesome tools, tutorials and APIs related to data from the Copernicus Sentinel Satellites.
- Sentinel-2 Cloud-Optimized GeoTIFFs
- Open access data on GCP
- Paid access via sentinel-hub and python-api.
- Example loading sentinel data in a notebook
- so2sat on Tensorflow datasets - So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
- eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples. Dataset and usage in EuroSAT: Land Use and Land Cover Classification with Sentinel-2, where a CNN achieves a classification accuracy 98.57%.
- bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.
- Jupyter Notebooks for working with Sentinel-5P Level 2 data stored on S3. The data can be browsed here
- Sentinel NetCDF data
Landsat
- Long running US program -> see Wikipedia and read the official webpage
- 8 bands, 15 to 60 meters, 185km swath, the temporal resolution is 16 days
- DECEMBER 2020: USGS publishes Landsat Collection 2 Dataset with 'significant geometric and radiometric improvements'. COG and STAC data format. Announcement and website. Beware data on Google and AWS (below) may be in different formats.
- Landsat 4, 5, 7, and 8 imagery on Google, see the GCP bucket here, with Landsat 8 imagery in COG format analysed in this notebook
- Landsat 8 imagery on AWS, with many tutorials and tools listed
- https://github.com/kylebarron/landsat-mosaic-latest -> Auto-updating cloudless Landsat 8 mosaic from AWS SNS notifications
- Visualise landsat imagery using Datashader
- Landsat-mosaic-tiler -> The repo host all the code for landsatlive.live website and APIs.
Spacenet
- Spacenet is an online hub for data, challenges, algorithms, and tools.
- spacenet.ai website covering the series of SpaceNet challenges, lots of useful resources (blog, video and papers)
- The SpaceNet 7 Multi-Temporal Urban Development Challenge: Dataset Release
- SpaceNet - WorldView-3 and article here. Also example semantic segmentation using Raster Vision
Planet
- Planet’s high-resolution, analysis-ready mosaics of the world’s tropics, supported through Norway’s International Climate & Forests Initiative. BBC coverage
Shuttle Radar Topography Mission (digital elevation maps)
Kaggle
Kaggle hosts over 60 satellite image datasets, search results here. The kaggle blog is an interesting read.
Kaggle - Amazon from space - classification challenge
- https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
- 3-5 meter resolution GeoTIFF images from planet Dove satellite constellation
- 12 classes including - cloudy, primary + waterway etc
- 1st place winner interview - used 11 custom CNN
- FastAI Multi-label image classification
Kaggle - DSTL - segmentation challenge
- https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection
- Rating - medium, many good examples (see the Discussion as well as kernels), but as this competition was run a couple of years ago many examples use python 2
- WorldView 3 - 45 satellite images covering 1km x 1km in both 3 (i.e. RGB) and 16-band (400nm - SWIR) images
- 10 Labelled classes include - Buildings, Road, Trees, Crops, Waterway, Vehicles
- Interview with 1st place winner who used segmentation networks - 40+ models, each tweaked for particular target (e.g. roads, trees)
- Deepsense 4th place solution
- My analysis here
Kaggle - Airbus Ship Detection Challenge
- https://www.kaggle.com/c/airbus-ship-detection/overview
- Rating - medium, most solutions using deep-learning, many kernels, good example kernel.
- I believe there was a problem with this dataset, which led to many complaints that the competition was ruined.
Kaggle - Draper - place images in order of time
- https://www.kaggle.com/c/draper-satellite-image-chronology/data
- Rating - hard. Not many useful kernels.
- Images are grouped into sets of five, each of which have the same setId. Each image in a set was taken on a different day (but not necessarily at the same time each day). The images for each set cover approximately the same area but are not exactly aligned.
- Kaggle interviews for entrants who used XGBOOST and a hybrid human/ML approach
Kaggle - Deepsat - classification challenge
Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat
Matlab format. JPEG?
- Imagery source
- Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three
- Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.
- Deep Gradient Boosted Learning article
Kaggle - Understanding Clouds from Satellite Images
In this challenge, you will build a model to classify cloud organization patterns from satellite images.
- https://www.kaggle.com/c/understanding_cloud_organization/
- 3rd place solution on Github by naivelamb
Kaggle - miscellaneous
- https://www.kaggle.com/reubencpereira/spatial-data-repo -> Satellite + loan data
- https://www.kaggle.com/towardsentropy/oil-storage-tanks -> Image data of industrial tanks with bounding box annotations, estimate tank fill % from shadows
Alternative datasets
There are a variety of datasets suitable for land classification problems.
Tensorflow datasets
- There are a number of remote sensing datasets
- resisc45 - RESISC45 dataset is a publicly available benchmark for Remote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class.
- eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples.
- bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.
UCMerced
- http://weegee.vision.ucmerced.edu/datasets/landuse.html
- Available as a Tensorflow dataset -> https://www.tensorflow.org/datasets/catalog/uc_merced
- This is a 21 class land use image dataset meant for research purposes.
- There are 100 RGB TIFF images for each class
- Each image measures 256x256 pixels with a pixel resolution of 1 foot
- Image classification of UCMerced using Keras or alternatively fastai
AWS datasets
- Landsat -> free viewer at remotepixel and libra
- Optical, radar, segmented etc. https://aws.amazon.com/earth/
- Spacenet data is hosted on S3
Quilt
- Several people have uploaded datasets to Quilt
Google Earth Engine
- https://developers.google.com/earth-engine/
- Various imagery and climate datasets, including Landsat & Sentinel imagery
- Python API but all compute happens on Googles servers
- Google Earth Engine Community on Github
- awesome-google-earth-engine - Curated list of Google Earth Engine resources
- ee-tensorflow-notebooks - Repository to place example notebooks for Deep Learning applications with TensorFlow and Earth Engine.
Weather Datasets
- UK met-odffice -> https://www.metoffice.gov.uk/datapoint
- NASA (make request and emailed when ready) -> https://search.earthdata.nasa.gov
- NOAA (requires BigQuery) -> https://www.kaggle.com/noaa/goes16/home
- Time series weather data for several US cities -> https://www.kaggle.com/selfishgene/historical-hourly-weather-data
UAV & Drone datasets
- Many on https://www.visualdata.io
- AU-AIR dataset -> a multi-modal UAV dataset for object detection.
- ERA -> A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos.
- Aerial Maritime Drone Dataset
- Stanford Drone Dataset
- RetinaNet for pedestrian detection
- Aerial Maritime Drone Dataset
- EmergencyNet - identify fire and other emergencies from a drone
- OpenDroneMap - generate maps, point clouds, 3D models and DEMs from drone, balloon or kite images.
Synthetic data
- The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation
- RarePlanes -> incorporates both real and synthetically generated satellite imagery including aircraft.
Interesting deep learning projects
Raster Vision by Azavea
- https://www.azavea.com/projects/raster-vision/
- An open source Python framework for building computer vision models on aerial, satellite, and other large imagery sets.
- Accessible through the Raster Foundry
- Example use cases on open data
RoboSat
- https://github.com/mapbox/robosat
- Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
- robosat-jupyter-notebook -> walks through all of the steps in an excellent blog post on the Robosat feature extraction and machine learning pipeline.
- Note there is/was fork of Robosat, originally named RoboSat.pink, and subsequently https://neat-EO.pink although this appears to be down/archived
DeepOSM
- https://github.com/trailbehind/DeepOSM
- Train a deep learning net with OpenStreetMap features and satellite imagery.
DeepNetsForEO - segmentation
- https://github.com/nshaud/DeepNetsForEO
- Uses SegNET for working on remote sensing images using deep learning.
Skynet-data
- https://github.com/developmentseed/skynet-data
- Data pipeline for machine learning with OpenStreetMap
Techniques
This section explores the different techniques (DL, ML & classical) people are applying to common problems in satellite imagery analysis. Classification problems are the most simply addressed via DL, object detection is harder, and cloud detection harder still (niche interest).
Land classification
Assign a label to an image, e.g. this is an image of a forest.
- Land classification using a simple sklearn cluster algorithm or deep learning.
- Land use is related to classification, but we are trying to detect a scene, e.g. housing, forestry. I have tried CNN -> See my notebooks
- Land Use Classification using Convolutional Neural Network in Keras
- Sea-Land segmentation using DL
- Pixel level segmentation on Azure
- Deep Learning-Based Classification of Hyperspectral Data
- A U-net based on Tensorflow for objection detection (or segmentation) of satellite images - DSTL dataset but python 2.7
- What’s growing there? Using eo-learn and fastai to identify crops from multi-spectral remote sensing data (Sentinel 2)
- FastAI Multi-label image classification
- Land use classification using Keras
- Detecting Informal Settlements from Satellite Imagery using fine-tuning of ResNet-50 classifier with repo
- Image classification of UCMerced using Keras or alternatively fastai
- Water Detection in High Resolution Satellite Images using the waterdetect python package -> The main idea is to combine water indexes (NDWI, MNDWI, etc.) with reflectance bands (NIR, SWIR, etc.) into an automated clustering process
Semantic segmentation
Whilst classification will assign a label to a whole image, semantic segmentation will assign a label to each pixel
- Instance segmentation with keras - links to satellite examples
- Semantic Segmentation on Aerial Images using fastai
- https://github.com/Paulymorphous/Road-Segmentation
- UNSOAT used fast.ai to train a Unet to perform semantic segmentation on satellite imageries to detect water - paper + notebook, accuracy 0.97, precision 0.91, recall 0.92.
Change detection
Monitor water levels, coast lines, size of urban areas, wildfire damage. Note, clouds change often too..!
- Using PCA (python 2, requires updating) -> https://appliedmachinelearning.blog/2017/11/25/unsupervised-changed-detection-in-multi-temporal-satellite-images-using-pca-k-means-python-code/
- Using CNN -> https://github.com/vbhavank/Unstructured-change-detection-using-CNN
- Siamese neural network to detect changes in aerial images
- https://www.spaceknow.com/
- LANDSAT Time Series Analysis for Multi-temporal Land Cover Classification using Random Forest
- Change Detection in 3D: Generating Digital Elevation Models from Dove Imagery
- Change Detection in Hyperspectral Images Using Recurrent 3D Fully Convolutional Networks
- PySAR - InSAR (Interferometric Synthetic Aperture Radar) timeseries analysis in python
- QGIS 2 plugin for applying change detection algorithms on high resolution satellite imagery
Image registration
Image registration is the process of transforming different sets of data into one coordinate system. Typical use is overlapping images taken at different times or with different cameras.
- Wikipedia article on registration -> register for change detection or image stitching
- Traditional approach -> define control points, employ RANSAC algorithm
- Phase correlation used to estimate the translation between two images with sub-pixel accuracy, useful for allows accurate registration of low resolution imagery onto high resolution imagery, or register a sub-image on a full image -> Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects. Applied to Landsat images here.
Object detection
A good introduction to the challenge of performing object detection on aerial imagery is given in this paper. In summary, images are large and objects may comprise only a few pixels, easily confused with random features in background. An example task is detecting boats on the ocean, which should be simpler than land based detection owing to the relatively blank background in images, but is still challenging.
- Intro articles here and here.
- DigitalGlobe article - they use a combination classical techniques (masks, erodes) to reduce the search space (identifying water via NDWI which requires SWIR) then apply a binary DL classifier on candidate regions of interest. They deploy the final algo as a task on their GBDX platform. They propose that in the future an R-CNN may be suitable for the whole process.
- Planet use non DL felzenszwalb algorithm to detect ships
- Segmentation of buildings on kaggle
- Identifying Buildings in Satellite Images with Machine Learning and Quilt -> NDVI & edge detection via gaussian blur as features, fed to TPOT for training with labels from OpenStreetMap, modelled as a two class problem, “Buildings” and “Nature”.
- Deep learning for satellite imagery via image segmentation
- Building Extraction with YOLT2 and SpaceNet Data
- Find sports fields using Mask R-CNN and overlay on open-street-map
- Detecting solar panels from satellite imagery
- Anomaly Detection on Mars using a GAN
- Tackling the Small Object Problem in Object Detection
- Satellite Imagery Multiscale Rapid Detection with Windowed Networks (SIMRDWN) -> combines some of the leading object detection algorithms into a unified framework designed to detect objects both large and small in overhead imagery
- 2020 Nature paper - An unexpectedly large count of trees in the West African Sahara and Sahel -> tree detection framework based on U-Net & tensorflow 2 with code here
- Truck Detection with Sentinel-2 during COVID-19 crisis -> moving objects in Sentinel-2 data causes a specific reflectance relationship in the RGB, which looks like a rainbow, and serves as a marker for trucks. Improve accuracy by only analysing roads.
- Counting-Trees-using-Satellite-Images -> create an inventory of incoming and outgoing trees for an annual tree inspections, uses keras
- Several useful articles on awesome-tiny-object-detection
Cloud detection
A subset of the object detection problem, but surprisingly challenging
- From this article on sentinelhub there are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here), but state that this requires too much compute resources.
- This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.
Wealth and economic activity measurement
The goal is to predict economic activity from satellite imagery rather than conducting labour intensive ground surveys
- Using publicly available satellite imagery and deep learning to understand economic well-being in Africa, Nature Comms 22 May 2020 -> Used CNN on Ladsat imagery (night & day) to predict asset wealth of African villages
- Combining Satellite Imagery and machine learning to predict poverty -> review article
- Measuring Human and Economic Activity from Satellite Imagery to Support City-Scale Decision-Making during COVID-19 Pandemic
- Predicting Food Security Outcomes Using CNNs for Satellite Tasking
- Crop yield Prediction with Deep Learning -> The necessary code for the paper Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data, AAAI 2017 (Best Student Paper Award in Computational Sustainability Track).
- https://github.com/taspinar/sidl/blob/master/notebooks/2_Detecting_road_and_roadtypes_in_sattelite_images.ipynb
Super resolution
Super-resolution imaging is a class of techniques that enhance the resolution of an imaging system. Very hot topic of research.
- https://medium.com/the-downlinq/super-resolution-on-satellite-imagery-using-deep-learning-part-1-ec5c5cd3cd2 -> Nov 2016 blog post by CosmiQ Works with a nice introduction to the topic. Proposes and demonstrates a new architecture with perturbation layers with practical guidance on the methodology and code. Three part series
- Super Resolution for Satellite Imagery - srcnn repo
- TensorFlow implementation of "Accurate Image Super-Resolution Using Very Deep Convolutional Networks" adapted for working with geospatial data
- Random Forest Super-Resolution (RFSR repo) including sample data
- Super-Resolution (python) Utilities for managing large satellite images
Pansharpening
Image fusion of low res multispectral with high res pan band.
- Several algorithms described in the ArcGIS docs, with the simplest being taking the mean of the pan and RGB pixel value.
- Does not require DL, classical algos suffice, see this notebook and this kaggle kernel
- https://github.com/mapbox/rio-pansharpen
Stereo imaging for terrain mapping & DEMs
Measure surface contours.
- Wikipedia DEM article and phase correlation article
- Intro to depth from stereo
- Map terrain from stereo images to produce a digital elevation model (DEM) -> high resolution & paired images required, typically 0.3 m, e.g. Worldview or GeoEye.
- Process of creating a DEM here and here.
- ArcGIS can generate DEMs from stereo images
- https://github.com/MISS3D/s2p -> produces elevation models from images taken by high resolution optical satellites -> demo code on https://gfacciol.github.io/IS18/
- Automatic 3D Reconstruction from Multi-Date Satellite Images
- Semi-global matching with neural networks
- Predict the fate of glaciers
- monodepth - Unsupervised single image depth prediction with CNNs
- Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
- Terrain and hydrological analysis based on LiDAR-derived digital elevation models (DEM) - Python package
- Phase correlation in scikit-image
- s2p -> a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos
- The Mapbox API provides images and elevation maps, article here
Lidar
NVDI - vegetation index
- Simple band math
ndvi = np.true_divide((ir - r), (ir + r))
but challenging due to the size of the imagery. - Example notebook local
- Landsat data in cloud optimised (COG) format analysed for NVDI with medium article here.
- Visualise water loss with Holoviews
SAR
- Removing speckle noise from Sentinel-1 SAR using a CNN
- A dataset which is specifically made for deep learning on SAR and optical imagery is the SEN1-2 dataset, which contains corresponding patch pairs of Sentinel 1 (VV) and 2 (RGB) data. It is the largest manually curated dataset of S1 and S2 products, with corresponding labels for land use/land cover mapping, SAR-optical fusion, segmentation and classification tasks. Paper: https://elib.dlr.de/128117/1/SEN12MS_Preprint.pdf Data: https://mediatum.ub.tum.de/1474000
- so2sat on Tensorflow datasets - So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
- Using Machine Learning to Automatically Detect Volcanic Unrest in a Time Series of Interferograms
Image formats, data management and catalogues
- GeoServer -> an open source server for sharing geospatial data.
- https://terria.io/ for pretty catalogues
- Remote pixel
- Sentinel-hub eo-browser
- Large datasets may come in HDF5 format, can view with -> https://www.hdfgroup.org/downloads/hdfview/
- Climate data is often in netcdf format, which can be opened using xarray
- The xarray docs list a number of ways that data can be stored and loaded.
- TileDB -> a 'Universal Data Engine' to store, analyze and share any data (beyond tables), with any API or tool (beyond SQL) at planet-scale (beyond clusters), open source and managed options. Recently hiring to work with xarray, dask, netCDF and cloud native storage
- Open Data Cube - serve up cubes of data https://www.opendatacube.org/
Cloud Optimised GeoTiff (COG)
- https://www.cogeo.org/
- TLDR: A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server (or Cloud object storage like S3), with an internal organization that enables more efficient workflows on the cloud. In particular they support HTTP range requests, enabling downloading of specific tiles rather than the full file. COGs work normally in GIS software such as QGIS.
- Intro presentation from Saheel Ahmed
- cog-best-practices
- rio-cogeo -> Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio.
- aiocogeo -> Asynchronous cogeotiff reader (python asyncio)
- Landsat data in cloud optimised (COG) format analysed for NVDI with medium article Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo.
STAC - SpatioTemporal Asset Catalog specification
The STAC specification provides a common metadata specification, API, and catalog format to describe geospatial assets, so they can more easily indexed and discovered. A 'spatiotemporal asset' is any file that represents information about the earth captured in a certain space and time. (from intake-stac docs)
- The aim is that the catalogue is crawlable so it can be indexed by a search engine and make imagery discoverable, without requiring yet another API interface.
- An initiative of https://www.radiant.earth/ in particular https://github.com/cholmes
- Spec at https://github.com/radiantearth/stac-spec
- Browser at https://github.com/radiantearth/stac-browser
- stacindex -> STAC Catalogs, Collections, APIs, Software and Tools
- Talk at https://docs.google.com/presentation/d/1O6W0lMeXyUtPLl-k30WPJIyH1ecqrcWk29Np3bi6rl0/edit#slide=id.p
- Example catalogue at https://landsat-stac.s3.amazonaws.com/catalog.json
- Chat https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby
- Several useful repos on https://github.com/sat-utils
- Intake-STAC -> Intake-STAC provides an opinionated way for users to load Assets from STAC catalogs into the scientific Python ecosystem. It uses the intake-xarray plugin and supports several file formats including GeoTIFF, netCDF, GRIB, and OpenDAP.
- sat-utils/sat-search -> Sat-search is a Python 3 library and a command line tool for discovering and downloading publicly available satellite imagery using STAC compliant API
State of the art
What are companies doing?
- Overall trend to using cloud (i.e. AWS, Google or Azure) storage buckets for hosting imagery
- Airbus are using a Google backend
- Planet are also on Google, not too surprising as Google own significant stock in Planet
- A serverless pipeline appears to be where companies are headed for routine compute tasks, whilst providing a Jupyter notebook approach for custom analysis. Checkout process Satellite data using AWS Lambda functions
- Traditional data formats aren't designed for processing, so new standards are developing such as cloud optimised geotiffs and zarr
- Google provide training on how to use Apache Spark on Google Cloud Dataproc to distribute a computationally intensive (satellite) image processing task onto a cluster of machines -> https://google.qwiklabs.com/focuses/5834?parent=catalog
Online platforms for Geo analysis
- This article discusses some of the available platforms -> TLDR Pangeo rocks, but must BYO imagery
- Pangeo - open source resources for parallel processing using Dask and Xarray http://pangeo.io/index.html
- Airbus Sandbox -> will provide access to imagery
- Descartes Labs -> access to EO imagery from a variety of providers via python API -> not clear which imagery is available (Airbus + others?) or pricing
- DigitalGlobe have a cloud hosted Jupyter notebook platform called GBDX. Cloud hosting means they can guarantee the infrastructure supports their algorithms, and they appear to be close/closer to deploying DL. Tutorial notebooks here. Only Sentinel-2 and Landsat data on free tier.
- Planet have a Jupyter notebook platform which can be deployed locally.
- Earth-i Spectrum appears to allow processing of imagery, with the capability to perform segmentation, change detection, object recognition. This promo video contains some screenshots of the application.
Free online computing resources
Generally a GPU is required for DL, and this section lists a couple of free Jupyter environments with GPU available. There is a good overview of online Jupyter development environments on the fast.ai site. I personally use Colab with data hosted on Google Drive
Google Colab
- Collaboratory notebooks with GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.
- Also a pro tier for $10 a month -> https://colab.research.google.com/signup
- Tensorflow pytorch can be installed
Kaggle - also Google!
- Free to use
- GPU Kernels - may run for 1 hour
- Tensorflow, pytorch & fast.ai available
- Advantage that many datasets are already available
Paperspace
- Free tier available
- https://docs.paperspace.com/gradient/instances/free-instances
Production
Once you have a trained model how do you expose it to the internet and other services? Usually through a rest API. This section lists a number of hosting options.
Custom REST API
- Basic https://blog.keras.io/building-a-simple-keras-deep-learning-rest-api.html with code here
- Advanced https://www.pyimagesearch.com/2018/01/29/scalable-keras-deep-learning-rest-api/
Tensorflow Serving
- https://www.tensorflow.org/serving/
- TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. Multiple models, or indeed multiple versions of the same model, can be served simultaneously. TensorFlow Serving comes with a scheduler that groups individual inference requests into batches for joint execution on a GPU
Pytorch serve
AWS sagemaker
Paperspace gradient
chip-n-scale-queue-arranger by developmentseed
- https://github.com/developmentseed/chip-n-scale-queue-arranger
- an orchestration pipeline for running machine learning inference at scale
- Supports fast.ai models
Useful open source software
- GDAL -> THE tool for reading and writing raster and vector geospatial data formats
- QGIS- Create, edit, visualise, analyse and publish geospatial information. Python scripting and plugins.
- Orfeo toolbox - remote sensing toolbox with python API (just a wrapper to the C code). Do activites such as pansharpening, ortho-rectification, image registration, image segmentation & classification. Not much documentation.
- QUICK TERRAIN READER - view DEMS, Windows
- dl-satellite-docker -> docker files for geospatial analysis, including tensorflow, pytorch, gdal, xgboost...
- AIDE V2 - Tools for detecting wildlife in aerial images using active learning
- Land Cover Mapping web app from Microsoft
- Solaris -> An open source ML pipeline for overhead imagery by CosmiQ Works, similar to Rastervision but with some unique very vool features
- openSAR -> Synthetic Aperture Radar (SAR) Tools and Documents from Earth Big Data LLC (http://earthbigdata.com/)
Python low level numerical & data manipulation
- Dask -> Read and manipulate tiled GeoTIFF datasets
- Rasterio -> reads and writes GeoTIFF and other raster formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.
- xarray -> N-D labeled arrays and datasets. Read Handling multi-temporal satellite images with Xarray
- xarray-spatial -> Fast, Accurate Python library for Raster Operations. Implements algorithms using Numba and Dask, free of GDAL
- Geowombat -> geo-utilities applied to air- and space-borne imagery, uses Rasterio, Xarray and Dask for I/O and distributed computing with named coordinates
- NumpyTiles -> a specification for providing multiband full-bit depth raster data in the browser
- Zarr -> Zarr is a format for the storage of chunked, compressed, N-dimensional arrays. Zarr depends on NumPy
Python general utilities
- gcsts for google cloud storage sile-system -> Pythonic file-system interface for Google Cloud Storage
- satpy - a python library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats
- Pyviz examples include several interesting geospatial visualisations
- geemap: A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and ipywidgets. See the Landsat timelapse example
- rio-color -> Color correction plugin for Rasterio
- WaterDetect -> an end-to-end algorithm to generate open water cover mask, specially conceived for L2A Sentinel 2 imagery. It can also be used for Landsat 8 images and for other multispectral clustering/segmentation tasks.
- DeepHyperX -> A Python/pytorch tool to perform deep learning experiments on various hyperspectral datasets.
- landsat_ingestor -> Scripts and other artifacts for landsat data ingestion into Amazon public hosting
- PyShp -> The Python Shapefile Library (PyShp) reads and writes ESRI Shapefiles in pure Python
- s2p -> a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos
- TorchSat is an open-source deep learning framework for satellite imagery analysis based on PyTorch.
- torchvision-enhance -> Enhance PyTorch vision for semantic segmentation, multi-channel images and TIF file,...
- felicette -> Satellite imagery for dummies. Generate JPEG earth imagery from coordinates/location name with publicly available satellite data.
- napari -> napari is a fast, interactive, multi-dimensional image viewer for Python. It’s designed for browsing, annotating, and analyzing large multi-dimensional images. By integrating closely with the Python ecosystem, napari can be easily coupled to leading machine learning and image analysis tools. Example viewing Landsat-8 imagery
Tools for image annotation
If you are performing object detection you will need to annotate images. Check that your annotation tool of choice supports large image (likely geotiff) files, as not all will. Note also that GEOJSON is widely used by remote sensing researchers but this annotation format is not commonly supported in general computer vision frameworks.
- Labelme Image Annotation for Geotiffs -> uses Labelme
- CVAT is worth investigating, and have an open issue to support large TIFF files. This article on Roboflow gives a good intro to CVAT.
- Deep Block is a general purpose AI platform that includes a tool for COCOJSON export for aerial imagery. Checkout this video
Movers and shakers
- Adam Van Etten is doing interesting things in object detection and segmentation
- Andrew Cutts cohosts the Scene From Above podcast and has many interesting repos
- Ankit Kariryaa published a recent nature paper on tree detection
- Chris Holmes is doing great things at Planet
- Christoph Rieke maintains a very popular imagery repo and has published his thesis on segmentation
- Jake Shermeyer many interesting repos
- Nicholas Murray is an Australia-based scientist with a focus on delivering the science necessary to inform large scale environmental management and conservation
- Qiusheng Wu is an Assistant Professor in the Department of Geography at the University of Tennessee
- Robin Wilson is a former academic who is very active in the satellite imagery space
Courses
- Manning: Monitoring Changes in Surface Water Using Satellite Image Data
- Working with Geospatial Data in Python on Datacamp
Competitions
- Spacenet 7: Multi-Temporal Urban Development Challenge - registration deadline Oct 28 2020. Track individual building construction over time from Planet imagery, challenge because of the small pixel area of each object, the high object density within images, and the dramatic image-to-image difference compared to frame-to-frame variation in video object tracking.
Online communities
Companies
- https://github.com/chrieke/geospatial-companies -> List of 500+ geospatial companies by Christoph Rieke
- Dymaxion Analytics -> a machine learning API for developing bespoke object detection models for satellite and drone imagery.
- Element84 -> consultancy
- CosmiQ Works -> an IQT Lab focused on developing, prototyping, and evaluating emerging open source artificial intelligence capabilities for geospatial use cases.
Jobs
- Pangeo discourse lists multiple jobs, global
Neural nets in space
Processing on satellite allows less data to be downlinked. E.g. super-resolution image might take 4-8 images to generate, then a single image is downlinked.
- Lockheed Martin and USC to Launch Jetson-Based Nanosatellite for Scientific Research Into Orbit - Aug 2020 - One app that will run on the GPU-accelerated satellite is SuperRes, an AI-based application developed by Lockheed Martin, that can automatically enhance the quality of an image.
- Intel to place movidius in orbit to filter images of clouds at source - Oct 2020 - Getting rid of these images before they’re even transmitted means that the satellite can actually realize a bandwidth savings of up to 30%,
About the author
My background is optical physics, and I have a PhD from Cambridge on the topic of Plasmon enhanced Raman spectroscopy. After doing a post doc I left academia and took a variety of roles, from industrial research at Sharp Labs Europe, to medical physics, to building optical telescopes at Surrey Satellites (SSTL). It was whilst at SSTL that I started this repo as a personal resource. I left SSTL, actually was made redundant along with 30% of the company, and after a brief stint at an IOT start up, I now work as a data engineer. Deep learning is currently a hobby, but I have ambitions to move into this domain when the right opportunity presents itself. My own satellite imagery projects are here, and feel free to connect with me on LinkedIn.