/singing-database-maker

AI based singing voice synthesis database generator

Primary LanguagePythonMIT LicenseMIT

singing-database-maker

Preface

High quality data is always a problem of singing voice synthesis. And it's really laborious to make a database from scratch. We hope AI can make the process a lot easier, so every music lover can make his own synthesised song.

Hopefully, there are just tools satisfy our needs. We combine them and a tool to ease the process.Thanks to all the contibuters of these grate researches. We list them below, so you can go and check their work:

[1] Sangeun Kum et al. “Semi-supervised learning using teacher-student models for vocal melody extraction”. In: Proc. International Society of Music Information Retrieval Conference (ISMIR). 2020.

[2] Xianming Li. XMNLP: A Lightweight Chinese Natural Language Processing Toolkit. https://github.com/SeanLee97/ xmnlp. 2018.

[3] Kilian Schulze-Forster et al. “Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation”. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), pp. 2382–2395. DOI: 10.1109/TASLP.2021. 3091817.

[4] Intelligence Engineering Lab @ Korea University. mdx-net-submission: Music demixer. https://github.com/kuielab/mdx-net-submission 2021.

Input and Output

Notice: The project now support English songs only, with Chinese support in early development. The key is Phoneme Level Lyrics Alignment module, we assume we can deal with it this summer.

We list input and outpit of thetoolkit here, so you can have a genneral idea of whether the project suits your needs.

Input: songs and their .lrc format lyrics.

Output:

  • songs devided into slices according to lyrics sentences
  • phoneme and word list, with the time their appear in the slice
  • a midi file generated by Semi-supervised AI network

We plan to add:

  • musicXML generator
  • Chinese support
  • more precise midi file
  • an synthesised example using the database

Steps

Introduction

suppose you have a song called foo. the processed database folder will be like this: (only list the tools you will use)

 - origin
 	- foo.wav
 	- foo.lrc
 - processed_data
 	- vocal
 	- slice
 		- foo00150019
 			foo00150019.wav
 			foo00150019.txt
 		- ...
 	- pitch
 		- pitch_foo00150019
 		- ...
 	- midi
 		- foo00150019.mid
 		- ...
 	- align
 		- foo00150019
 			- phoneme_onsets
 				- foo00150019.txt
 			- word_onsets
 				- foo00150019.txt
 - utils
 	- english-align
 		- phoneme_from_word
 		- make_phoe.py
 	- melodyExtraction
 		- gen_freq.py
 	- vocal-extraction
 	- config.py
    - song_cutter.py
    - demix_vocal.py
    - gen_midi.py
    - make_Midi.py
    - make_lab.py (not finished)
    - make_musicxml.py (not finished)
    - delete_useless.py (not tested)
    - missing.txt (generate after align)

Project download

Download the full project, and its submodules.

We use submodules to ease our development, so you must use --recurse-submodules to clone the full project.

$ git clone --recurse-submodules git@github.com:leavelet/singing-database-maker.git

Environment preparation

When dealing with mutiple AI projects, it will make your life much easier to set up the environment properly at first step. We've had a hard time dealing with all of this, and we found you can use the project on your own pc if you set correctly.

  1. install a python virtual environment manager

    We use recommand conda, and we take conda as an example.

  2. crate environments

    $ cd requirements
    
    # misc
    $ conda create -n singing-dealer
    $ conda activate singing-dealer
    $ pip install -r make_midi.txt
    
    # vocal extraction
    $ conda env create -f vocal-extraction/environment.yml
    #if you use arm mac, use environment-m1.yml
    $ conda activate vocal-extraction
    $ pip install -r vocal-extraction/requirements.txt
    
    # alignment & melody extraction
    $ conda env create -f vocal-extraction/maker_ai_cpu.yml
    # if you have gpu, use maker_ai_gpu.yml
    $ conda activate maker_ai
    $ pip install -r melody_extraction.txt
  3. Download models

    1. Download demucs models
    $ conda activate vocal-extract
    $ python download_demucs.py
    1. Download onnx models

    Download the models from release page, put them under utils/vocal-extracion/onnxfolder.

  4. download the make_dic tool and put it under project root directory

from now on, We assume your current dir is utils

Setup your own config

We use config.py to control the whole project. All the files are programmed to follow settings.

  1. set the project root.

    Project root is parent folder of utils. We use .. to mark the root since we are in utilsfolder, but an absolute path is recommanded, to avoid mistakes like forgetting to change back to utils after operation.

  2. set your thread_num.

    We use parallelism to accelerate processing. Set the thread_num to a proper number to make full use of your processor. Default is 10.

Seperate songs

Demix the slices

Generate midi notes and make midi

Generate phoneme Level Lyrics Alignment

License and acknowledgement

The whole project is under MIT License, all the projects we used in this project are under their own license.

We do not guarantee the quality of dataset, and before using any data, you must have the appropriate copyright permission.