Deep mutual attention network for acoustic scene classification

👍👍👍🤙🤙🤙

This repository provides the Official PyTorch implementation of DMA-Net, from the following paper:

Deep mutual attention network for acoustic scene classification. Digital Signal Processing, 2022.
Wei Xie, Qianhua He, Zitong Yu, Yanxiong Li

South China University of Technology, Guangzhou, China

Figure 1: Schematic diagram of the proposed deep mutual attention network.

For the sake of simplicity, this repository provides instructions on processing the DCASE2018-1a dataset using both PyTorch and MATLAB. However, it can easily be adapted for processing other datasets by making changes solely to the dataset path and label information.

DATASET

The dataset is downloadable from here

The dataset contains 10 classes of audio scenes, recorded with Device A, B and C. The statistic of the data is shown below:

	Attributes	Dev.	Test
Subtask A	Binanural, 48 kHz, 24 bit	Device A: 8640	1200
Subtask B	Mono, 44.1 kHz	Device A: 8640 Device B: 720 Device C: 720	2400
Subtask C	-	Any data	1200

The log mel spectrogram of the scenes are shown below:

0. Requirements

Conda should be installed on the system.

Install Anaconda or conda
Run the install dependencies script:

conda env create -f environment.yml

This creates conda environment DCASE with all the dependencies.

1. Time-frequency features (TFFs) extraction

Please follow the instructions in here to prepare the TFFs of all the audio files.

2. Integrate multiple feature files.

python matlab_feature_conversion.py --project_dir path to DMA-Net

3. Test the data loading function and normalize the features.

python DatasetsManager.py --project_dir path to DMA-Net

4. Run the main script.

python main.py --project_dir path to DMA-Net

The script is configured to conduct 10 trials by default, with the final results being aggregated for statistical analysis.

You should obtain results similar to the following, which may vary based on your software and hardware configuration.

final_results

Acknowledgment

Our project references the codes in the following repos.

Citation

If you find this repository helpful, please consider citing:

@article{xie2022deep,
  title={Deep mutual attention network for acoustic scene classification},
  author={Xie, Wei and He, Qianhua and Yu, Zitong and Li, Yanxiong},
  journal={Digital Signal Processing},
  volume={123},
  pages={103450},
  year={2022},
  publisher={Elsevier}
}

chester-w-xie/DMA-Net