Cross-Modal-Remote-Sensing-Image-Sound-Retrieval

This repository is for the research of cross-modal remote sensing image-sound retrieval.

We construct three datasets about cross-modal remote sensing image-sound retrieval, including Sydney image-sound dfataset, UCM image-sound dfataset, and RSICD image-sound dfataset. Descriptions about them are as follows:

Sydney Image-Sound Dataset: The Sydney image-sound dataset contains 613 remote sensing images and 3065 sounds of 7 classes, where each image corresponds to five different sounds.

UCM Image-Sound Dataset: The UCM image-sound dataset includes 2100 remote sensing images and 10500 sounds, where each image corresponds to five different sounds. Note that the dataset can be divided into 21 classes, where each class includes 100 images and 500 sounds.

RSICD Image-Sound Dataset: The RSICD image-sound dataset involves 10921 remote sensing images and 54605 sounds of 30 classes, where each image corresponds to five different sounds.

Datatset:

百度网盘:https://pan.baidu.com/s/1gR-9MKxFtjSCTamj67_DQw 提取码:81cr

If our dataset is helpful to you, please kindly cite related papers as follows:

[1] Mao G, Yuan Y, Xiaoqiang L. Deep cross-modal retrieval for remote sensing image and audio[C]//2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS). IEEE, 2018: 1-7.

[2] Chen Y, Lu X, Wang S. Deep cross-modal image–voice retrieval in remote sensing[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(10): 7049-7061.

[3] Ning H, Zhao B, Yuan Y. Semantics-Consistent Representation Learning for Remote Sensing Image–Voice Retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-14.

[4] Chen Y, Xiong S, Mou L, et al. Deep Quadruple-based Hashing for Remote Sensing Image-Sound Retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022.