/MTANet

INTERSPEECH2023: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music

Primary LanguagePython

MTANet

Introduction

The official implementation of "MTANET: Multi-band Time-frequency attention Network for Singing Melody Extraction from Polyphonic Music.

We propose a more powerful singing melody extractor named multi-band time-frequency attention network (MTANet) for polyphonic music. Experimental results show that our proposed MTANet achieves promising performance compared with existing state-of-the-art methods, while keeping with a small number of network parameters.

MTANet Architecture

Important updata

2023. 03. 19

(i) Due to the author's mistake, Figure 3 in the manuscript of the paper shows an earlier version, which may cause some misunderstandings for reviewers and readers. I am very sorry for this situation! The following picture is the revised version for reference and I will make formal corrections in the subsequent manuscript.

Hourglass sub-network

(ii) Rename the MMNet to the MTANet.

2023. 03. 20

The author has contacted the chairs and applied for modification. If the modification is successful, please ignore the above update. I am very sorry for the inconvenience to the reviewers and readers.

2023. 05. 20

The Paper has been accepted by INTERSPEECH 2023 and the official version awaits the official release.

The rest of the code will be sorted out and published soon.

2023. 06. 11

All the code is uploaded.

Getting Started

Download Datasets

After downloading the data, use the txt files in the data folder, and process the CFP feature by feature_extraction.py.

Note that the label data corresponding to the frame shift should be available before generation.

main.py is the main function of this project.

Model implementation

Refer to the file: mtanet.py

The replication code for other comparison models has been uploaded and can be found in the folder: control group model.

Result

Prediction result

The visualization illustrates that our proposed MTANet can reduce the octave errors and the melody detection errors.

estimation1

estimation

Comprehensive result

The scores here are either taken from their respective papers or from the result implemented by us. Experimental results show that our proposed MTANet achieves promising performance compared with existing state-of-the-art methods.

Result

Ablation study result

We conducted seven ablations to verify the effectiveness of each design in the proposed network. Due to the page limit, we selected the ADC2004 dataset for ablation study in the paper. More detailed results are presented here.

ablution_ADC2004

ablution_MIREX 05

ablution_MEDLEY DB

Download the pre-trained model

Refer to the contents of the folder: pre-train model.

Special thanks