We have captured ten different virtual indoor rescue scenarios by a drone. The video and audio sets were captured on a drone three times for each sequence. We dressed mannequins as firefighters, rescuers, medical staff, and the general person(male, female, child). We arranged mannequins in a different pose for each scenario, and the audio of the rescue request voice was configured differently(male, female, child).
We have captured 30 sets of data for multi-object detection, crowd counting, optical character recognition, speaker recognition, etc. Images were composed of 1920x1080 resolution, and voice data was acquired by a 7-channel microphone (16Khz sampling rate and 1024 chunk size).
This opensource is a collaboration between NCSOFT, UVify Co., Ltd., Sogang University and mpWAV Inc. Additional information about the dataset can be found at the URL below.