Singapore-Maritime-Dataset-Frames-Ground-Truth-Generation-and-Statistics

Repository for generating frames from the SIngapore Maritime Dataset (SMD) videos and converting the corresponding ground truth files. FInally, some basic statistics are generated.

Dataset

Before running any of the scripts or the Jupyter notebooks you need to first download the dataset and unrar/unzip it in the home folder of this project (preferable). The dataset can be acquired from https://sites.google.com/site/dilipprasad/home/singapore-maritime-dataset.

Explanation of python scripts

convert_mat_to_csv_LEGACY.py : this is a legacy script to create some CSV files needed for the data statistics. These files are also included in this repo (objects_nir.txt, objects_onshore.txt, objects_onboard.txt) but this is the sript to generate them if required.
load_mat_into_csv_xml.py : This is a script convert the Singapore Maritime Dataset (SMD) .mat object ground truth files into a CSV (tensorflow compatible) and VOC XML format for further processing.
generate_tfrecord.py : This is a script to generate a tfrecord from the generated CSV files from load_mat_into_csv_xml.py script. Please see its documentation for usage.

Explanation of Jupyter notebooks

Singapore_dataset_frames_generation_and_histograms.ipynb : This notebook generates frames from the SMD videos. It can generate all the frames or every Nth frame and split them into train/test datasets (default ration 70%/30%). This notebook generated the first dataset I worked with.
Singapore_dataset_frames_generation_2nd_dataset.ipynb : Like the previous, this is a notebook that generates every Nth frame of the videos and splits them train/test datasets. It also have the option to leave some videos completely to the test dataset. This notebook generated the second dataset I worked with.
Singapore_maritime_dataset_statistics_all_frames.ipynb : Notebook to generate several statistics for the full dataset (all frames).
Singapore_maritime_dataset_statistics_split_first_dataset.ipynb : Notebook to generate several statistics for the first dataset split into train/test.
Singapore_maritime_dataset_statistics_split_second_dataset.ipynb : Notebook to generate several statistics for the second dataset split into train/test. (This is the same notebook as for the first dataset split statistics but used for the second dataset.)

Explanation of files

objects_nir.txt : CSV file that contains data from all objects from the near infra-red dataset. Generated by the legacy script above.
objects_onshore.txt : CSV file that contains data from all objects from the onshore dataset. Generated by the legacy script above.
objects_onboard.txt : CSV file that contains data from all objects from the onboard dataset. Generated by the legacy script above.

Explanation of folders

figures : containes all figures generated from the full dataset using Singapore_maritime_dataset_statistics_all_frames.ipynb notebook.
figures_split : containes all figures generated from the first train/test split dataset using Singapore_maritime_dataset_statistics_split_first_dataset.ipynb notebook.
figures_split_split : containes all figures generated from the second train/test split dataset using Singapore_maritime_dataset_statistics_split_second_dataset.ipynb notebook.

Example Statistics

Here some basics statistics generated for the whole dataset will be given.

Histogram of the objects area ration compared to the images total area

Frequency of objects' type, motion and distance per dataset and combined

Distance of objects by type and video source

Motion of objects by type and video source

Objects type count per video in total values

Objects type count per video normalized

Heatmap of all objects in the dataset

Citing

If the Singapore Maritime Dataset is used please cite it as: D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabaly, and C. Quek, "Video Processing from Electro-optical Sensors for Object Detection and Tracking in Maritime Environment: A Survey," IEEE Transactions on Intelligent Transportation Systems (IEEE), 2017.

If code/figures from this repo are used please cite this repository as:

Tilemachos Bontzorlos, "Singapore Maritime Dataset frames ground truth generation and statistics", GitHub repository, Feb. 2019. https://github.com/tilemmpon/Singapore-Maritime-Dataset-Frames-Ground-Truth-Generation-and-Statistics.

Contribution

To report an issue use the GitHub issue tracker. Please provide as much information as you can.

Contributions are always welcome. Open an issue to contact me. The preferred method of contribution is through a github pull request.