Kyran0255/SpeechSynthesis

음성합성 관련 자료 모음

Apache-2.0

Text-to-Speech Synthesis

딥러닝을 이용한 음성합성 관련 자료 모음

Lectures & Seminars

책 읽어주는 딥러닝 (김태훈, 2017.11)
- Tacotron에 대해 쉽게 이해할 수 있도록 DEVIEW 2017에서 발표한 영상
모두의 연구소 WaveNet 스터디 영상 (김승일, 2017.10)
- WaveNet에 대해 이해한 것을 설명 및 온라인 토론내용이 담긴 영상
Generative Model-Based Text-to-Speech Synthesis (Heiga Zen, 2017.02)
- WaveNet 논문 저자 중 1명인 Heiga Zen이 소개하는 TTS 전반적인 기술 및 WaveNet 소개 영상

Dataset

CMU_ARCTIC (en)
- CMU의 Language Technologies Institute에서 음성합성 연구를 위해 만든 US English 데이터셋
The LJ Speech Dataset (en)
- Keith Ito란 사람의 웹사이트에 올라와 있지만 어디서, 왜 만들었는지에 대한 내용은 찾지 못함
Blizzard 2012 (en)
- Blizzard Challenge 2012라는 코퍼스기반 음성합성 챌린지에서 사용된 데이터셋
CSTR VCTK Corpus (en)
- English Multi-speaker Corpus for CSTR Voice Cloning Toolkit

한국어 코퍼스

KSS Dataset: Korean Single speaker Speech Dataset

WaveNet

Paper

WaveNet: A Generative Model for Raw Audio (2016.09)

Articles

WaveNet: A Generative Model for Raw Audio (DeepMind Blog)

Source Code

Multi-GPU

WaveNet 학습시간이 너무 오래 걸려서 멀티 GPU를 이용하지 않으면 답이 나오지 않는 것 같다. 그와 관련된 코드 링크를 정리하였다.

https://github.com/nakosung/tensorflow-wavenet/tree/multigpu (Tensorflow)
- WaveNet multi GPU 구현 버전
https://github.com/nakosung/tensorflow-wavenet/tree/model_parallel (Tensorflow)
- WaveNet model parallelism 구현 버전

Fast WaveNet

Paper

Fast Wavenet Generation Algorithm (2016.11)

Articles

Source Code

Parallel WaveNet

Paper

Parallel WaveNet: Fast High-Fidelity Speech Synthesis (2017.11)

Articles

High-fidelity speech synthesis with WaveNet (DeepMind Blog)

Source Code

https://github.com/kensun0/Parallel-Wavenet (not a complete implement)

WaveRNN

Paper

Efficient Neural Audio Synthesis (2018.02)

Deep Voice

Paper

Deep Voice: Real-time Neural Text-to-Speech (2017.02)

Deep Voice 2

Paper

Deep Voice 2: Multi-Speaker Neural Text-to-Speech (2017.05)

Deep Voice 3

Paper

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning (2017.10)

Source Code

Tacotron

Paper

Tacotron: Towards End-to-End Speech Synthesis (2017.05)

Source Code

https://github.com/keithito/tacotron
https://github.com/Kyubyong/tacotron
https://github.com/barronalex/Tacotron
https://carpedm20.github.io/tacotron/ (Multi-speaker Tacotron in TensorFlow)
- Tactron 1과 Deep Voice 2의 Multi-speaker를 구현한 프로젝트

Tacotron 2

Paper

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (2017.12)

Articles

Tacotron 2: Generating Human-like Speech from Text (Google Research Blog)

Source Code

https://github.com/riverphoenix/tacotron2 (구현됨)
https://github.com/Rayhane-mamah/Tacotron-2 (구현중)
https://github.com/selap91/Tacotron2 (구현중)
https://github.com/CapstoneInha/Tacotron2-rehearsal
https://github.com/A-Jacobson/tacotron2 (PyTorch)
https://github.com/maozhiqiang/tacotron_cn (구현 확인 필요/중국어)
https://github.com/LGizkde/Tacotron2_Tao_Shujie (체크 필요)
https://github.com/ruclion/tacotron_with_style_control (Style Control)