You can extract a exact segment of laughter from various talking audio using trained model and code. You can also train your own model.
Code, annotations, and model are described in the following paper: Taisei Omine, Kenta Akita, and Reiji Tsuruno, "Robust Laughter Segmentation with Automatic Diverse Data Synthesis", Interspeech 2024.
git clone https://github.com/omine-me/LaughterSegmentation.git
cd LaughterSegmentation
python -m pip install -r requirements.txt
# ↓ Depends on your environment. See https://pytorch.org/get-started/locally/
python -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
Run in Venv environment is recommended. Also, download model.safetensors
from Huggingface (1.26 GB) and place it in models
directory and make sure the name is model.safetensors
.
Python<=3.11 is required (#2).
Tested on Windows 11 with GeForce RTX 2060 SUPER.
- Prepare audio file.
- Open Terminal and go to the directory where
inference.py
is located. - Run
python inference.py --audio_path audio.wav
. You have to change audio.wav to your own audio path. You can use common audio format likemp3
,wav
,opus
, etc. 16kHz wav audio is faster. If the audio fails to load, run the following command and also download FFmpeg and add it to the PATH.python -m pip uninstall pysoundfile python -m pip uninstall soundfile python -m pip install soundfile
- If you want to change output directory, use
--output_dir
option. If you want to use your own model, use--model_path
option. - Result will be saved in output directory in json format. To visualize the results, you can use this site (not perfect because it's for debugging).
Read README in train directory.
Read README in evaluavtion directory.
This repository is MIT-licensed, but the publicly available trained model is currently available for research use only.
Cite as: Omine, T., Akita, K., Tsuruno, R. (2024) Robust Laughter Segmentation with Automatic Diverse Data Synthesis. Proc. Interspeech 2024, 4748-4752, doi: 10.21437/Interspeech.2024-1644
or
@inproceedings{omine24_interspeech,
title = {Robust Laughter Segmentation with Automatic Diverse Data Synthesis},
author = {Taisei Omine and Kenta Akita and Reiji Tsuruno},
year = {2024},
booktitle = {Interspeech 2024},
pages = {4748--4752},
doi = {10.21437/Interspeech.2024-1644},
}
Use Issues or reach out my X(Twitter).