| PPT |
We consider the multi-label VHR UC-Merced dataset (domain A ), where the B domain corresponds to speech signals. In particular, we construct a corpus of spoken speech samples for each of the land-cover semantic labels in .wav format. To increase the diversity of the speech samples, it is ensured that the labels are pronounced with different English accents. In this way, We gather 15 speech samples for each label, leading to 255 speech samples in total. Also, note that the multi-label UC-Merced dataset consists of 2100 VHR images of size 256 ×256, where each image has multiple associated semantic labels from a set of 17 land-cover categories.
If you find this dataset useful, please cite the following paper:
CMIR-NET : A deep learning based model for cross-modal retrieval in remote sensing
Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu. In Pattern Recognition Letters Volume 131, March 2020, Pages 456-462. (https://doi.org/10.1016/j.patrec.2020.02.006)