/LaughterSynthesis

This repository contains laughter-related synthesis systems.

LaughterSynthesis

This repository contains implementations of laughter synthesis systems. And the samples used for the MOS test in [2].

For reproducibility of the ICASSP 2015 paper [1].

  • The HTS demo found here can be used to replicate the results described in this work.
  • The SpkB-Fr subject of the AmuS dataset was used to train this system
  • For the AmuS dataset, please contact kevin [dot] elhaddad [at] umons [dot] ac [dot] be

For reproducibility of the Interspeech 2020 paper [2]:

  • Training process:

    • The DCTTS system is first trained using the speech from Acapela, the smiled speech and laughs from AmuS (preferably coming from the subject SpkB).
    • The same system is then fine-tuned using the same smiled speech and laughs of the AmuS dataset.
    • The MelGAN system is used without (re-)training.
  • Run time:

    • a sequence of laughter labels are given at the input of the DCTTS which generates a somewhat noisy waveform.
    • this waveform is then passed by the MelGAN system, which generated a "cleaned" version of this waveform.
  • DCTTS-MelGAN implementations:

  • Concerning the data used in this work:

    • The Acapela voice used is a propriatary dataset and can therefore, unfortunately not be distributed. But this system should work when pre-trained with other voices as well in a similar way as described in [2].
    • For the AmuS dataset, please contact kevin [dot] elhaddad [at] umons [dot] ac [dot] be
  • The HTS based system was the same as the one from for ICASSP 2015 article [1] [1]

References

[1] Kevin El Haddad, Stephane Dupont, Jerome Urbain, Thierry Dutoit "Speech-laughs: an HMM-based approach for amused speech synthesis." 2015 ICASSP.

[2] Noe Tits, Kevin El Haddad, Thierry Dutoit, "Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning" 2020 Interspeech (In press)