Repository for generating frames from the SIngapore Maritime Dataset (SMD) videos and converting the corresponding ground truth files. FInally, some basic statistics are generated.
Before running any of the scripts or the Jupyter notebooks you need to first download the dataset and unrar/unzip it in the home folder of this project (preferable). The dataset can be acquired from https://sites.google.com/site/dilipprasad/home/singapore-maritime-dataset.
- convert_mat_to_csv_LEGACY.py : this is a legacy script to create some CSV files needed for the data statistics. These files are also included in this repo (objects_nir.txt, objects_onshore.txt, objects_onboard.txt) but this is the sript to generate them if required.
- load_mat_into_csv_xml.py : This is a script convert the Singapore Maritime Dataset (SMD) .mat object ground truth files into a CSV (tensorflow compatible) and VOC XML format for further processing.
- generate_tfrecord.py : This is a script to generate a tfrecord from the generated CSV files from load_mat_into_csv_xml.py script. Please see its documentation for usage.
- Singapore_dataset_frames_generation_and_histograms.ipynb : This notebook generates frames from the SMD videos. It can generate all the frames or every Nth frame and split them into train/test datasets (default ration 70%/30%). This notebook generated the first dataset I worked with.
- Singapore_dataset_frames_generation_2nd_dataset.ipynb : Like the previous, this is a notebook that generates every Nth frame of the videos and splits them train/test datasets. It also have the option to leave some videos completely to the test dataset. This notebook generated the second dataset I worked with.
- Singapore_maritime_dataset_statistics_all_frames.ipynb : Notebook to generate several statistics for the full dataset (all frames).
- Singapore_maritime_dataset_statistics_split_first_dataset.ipynb : Notebook to generate several statistics for the first dataset split into train/test.
- Singapore_maritime_dataset_statistics_split_second_dataset.ipynb : Notebook to generate several statistics for the second dataset split into train/test. (This is the same notebook as for the first dataset split statistics but used for the second dataset.)
- objects_nir.txt : CSV file that contains data from all objects from the near infra-red dataset. Generated by the legacy script above.
- objects_onshore.txt : CSV file that contains data from all objects from the onshore dataset. Generated by the legacy script above.
- objects_onboard.txt : CSV file that contains data from all objects from the onboard dataset. Generated by the legacy script above.
- figures : containes all figures generated from the full dataset using Singapore_maritime_dataset_statistics_all_frames.ipynb notebook.
- figures_split : containes all figures generated from the first train/test split dataset using Singapore_maritime_dataset_statistics_split_first_dataset.ipynb notebook.
- figures_split_split : containes all figures generated from the second train/test split dataset using Singapore_maritime_dataset_statistics_split_second_dataset.ipynb notebook.
Here some basics statistics generated for the whole dataset will be given.
If the Singapore Maritime Dataset is used please cite it as: D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabaly, and C. Quek, "Video Processing from Electro-optical Sensors for Object Detection and Tracking in Maritime Environment: A Survey," IEEE Transactions on Intelligent Transportation Systems (IEEE), 2017.
If code/figures from this repo are used please cite this repository as:
Tilemachos Bontzorlos, "Singapore Maritime Dataset frames ground truth generation and statistics", GitHub repository, Feb. 2019. https://github.com/tilemmpon/Singapore-Maritime-Dataset-Frames-Ground-Truth-Generation-and-Statistics.
To report an issue use the GitHub issue tracker. Please provide as much information as you can.
Contributions are always welcome. Open an issue to contact me. The preferred method of contribution is through a github pull request.