Deep learning based object recognition in multispectral satellite imagery for real-time applications

In this repository we share an accurate and high-speed U-net architecture that is able to conduct a semantic segmentation operation for object recognition in multispectral satellite imagery. We utilize specific problem: “light-vehicle recognition in satellite imagery”. “Light-vehicle” object recognition requires the highest precision (due to a small object size of ~120 pixels only) as well as the model’s ability to generalize across dispersed scenes. The solution could be applicable to larger objects (e.g., aircrafts, trucks, ships, buildings) and generalizable across other satellite imagery datasets. We show that our U-net architecture exceeds human-level performance with state-of-the-art 97.67% accuracy over multiple sensors, it is able to generalize across dispersed scenery and outperforms other proposed methods to date. Its computationally light architecture delivers a fivefold improvement in training time and a rapid prediction, essential to real-time applications.

Train_notebook.ipynb - contains jupyter notebook code for model training.

make_image_coords.py - file for pre-processing image coordinates

Dataset

The dataset used in these experiments was created from an open-source raw satellite imagery database SpaceNet with high-resolution imagery taken by DigitalGlobe WorldView-3 satellite. A total of 125 high resolution (30cm per pixel) RGB-PanSharpen geotiffs satellite images equivalent to 50 km2 AOI of Paris, Shanghai, Las Vegas, and Khartoum were used for this dataset.

A total of 350 hours of professional annotation work has been conducted to prepare a high-quality set with 80,316 labelled objects. Images were annotated using QGIS geospatial imagery software. Labelling and polygon coordinate generation has been manually completed by multiple professional annotators and quality cross-checked. We are publishing our in-house developed, proprietary dataset with labelled polygons online to enable further development in this field.

Repository structure

  • Annotations directory

    • img[1-125]_coordinates.csv - every file contains coordinates of objects. Every line holds a polygon of coordinates for single vehicle. Coordinates are relative to image top left corner. Example:

      [491 784],[510 777],[515 782],[496 789],[491 784]
      
    • combined_polygons_125.csv - file contains all objects coordinates for every image. Single line represents:

      Image ID, Class type, WKT multipolygon.
      

      Well-known text (WKT) is a text markup language for representing vector geometry objects on a map. Objects are described in WGS84 reference coordinate system.

    • grid_sizes_125.csv - file holds data about images and describes maximum and minimum coordinates for every image. Single line represents:

      Image ID, X minimum, X maximum, Y minimum, Y maximum
      

      Objects are described in WGS84 reference coordinate system.

  • Images directory

    • img[1-125].tif - 125 raw satellite images (size 1300x1300 px, 30cm per pixel).
    • img[1-125].png - original .png images created from .tif files
    • img[1-40]_augmented.png - random brightness changes added to images

    • img[41-80]_augmented.png - random noise added to all 3 channels

    • img[81-125]_augmented.png - random noise added on single channel

  • References

This data set was derived from SpaceNet 2, Building Detection v2:

- SpaceNet on Amazon Web Services (AWS). “SpaceNet 2, Building Detection v2” The SpaceNet Catalog. Last modified October 1st, 2018. Accessed on September 1st 2019. https://spacenet.ai/datasets/

If you are using data from this repository in a paper, please use the following citation:

- P. Gudžius, O. Kurasova, V. Darulis and E. Filatovas, "VUDataScience," 2020.
Available: https://github.com/VUDataScience/Deep-learning-based-object-recognition-in-multispectral-satellite-imagery-for-low-latency-applicatio.