
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Primary LanguageJupyter NotebookMozilla Public License 2.0MPL-2.0

ITAcotron 2

Codebase for the papers "ITAcotron 2: the Power of Transfer Learning in Expressive TTS Synthesis" and "ITAcotron 2: Transfering English Speech Synthesis Architectures and Speech Features to Italian". For all the references, contributions and credits, please refer to the papers.

This code was originally developed as part of the M.Sc. Thesis in Cognitive Science "Conditional Text to Speech by Means of Transfer Learning". The M.Sc. degree was released by the Center for Mind/Brain Sciences (CIMeC) of the UniversitĂ  degli Studi di Trento (UniTn). The Thesis was supervised at Politecnico di Milano (PoliMI) by the staff of the ARCSlab.


To generate Italian clips, you can use the notebook at the following path:

Model weights

Link to download the weights of the trained models:

  • Tacotron 2 [ link ] (trained on Italian data)
  • FB-MelGAN Vocoder [ link ] and vocoder configuration file [ link ] (taken from the original repo, trained on English data)
  • Speaker Encoder [ link ] and speaker encoder configuration file [ link ] (taken from the original repo, trained on English data)

Changes from origin

The code in this repository is based on a fork of the Mozilla TTS repository. Please refer to the source for the documentation.

With respect to the original implementation, we modified the following files:

  • TTS/tts/datasets/preprocess.py
  • TTS/tts/datasets/TTSDataset.py
  • TTS/tts/utils/text/__init__.py
  • TTS/tts/utils/text/cleaners.py
  • TTS/tts/utils/text/symbols.py
  • TTS/bin/train_tacotron.py

The code was taken from this commit.

Added configurations

Configuration files added for the training of Italian TTS:

  • TTS/tts/configs/config_first_finetuning.json
  • TTS/tts/configs/config_second_finetuning.json

Cite work

If you are willing to use our code, please cite our work through the following BibTeX entries:

	Address = {Trento, Italy},
	Author = {Favaro, Anna and Sbattella, Licia and Tedesco, Roberto and Scotti, Vincenzo},
	Booktitle = {Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)},
	Month = {12--13 } # nov,
	Pages = {83--88},
	Publisher = {Association for Computational Linguistics},
	Title = {{ITA}cotron 2: Transfering {E}nglish Speech Synthesis Architectures and Speech Features to {I}talian},
	Url = {https://aclanthology.org/2021.icnlsp-1.10},
	Year = {2021}

	Author={Favaro, Anna  and Sbattella, Licia  and Tedesco, Roberto  and Scotti, Vincenzo},
	Editor={Abbas, Mourad},
	Title={ITAcotron 2: the Power of Transfer Learning in Expressive TTS Synthesis},
	BookTitle={Analysis and Application of Natural Language and Speech Processing},
	Publisher={Springer International Publishing},


We wish to thank all the contributors to the "TTS: Text-to-Speech for all" repository for their help.