Onssen, pronounced as おんせん(温泉, Japanese hot spring), is a PyTorch-based library for speech separation, speech enhancement, or speech style transformation.
- Provide template classes for data, model, and evaluation
- Move models to separate folders (i.e. Kaldi style)
- Reproduce scores and upload pretrained models
- Finish inference method for online separation
- Add evaluation method for deep clustering
- Use W_{MR} weight in deep clustering
- Minor changes
- Deep Clustering
- Chimera Net
- Chimera++
- Phase Estimation Network
- Speech Enhancement with Restoration Layers
- Wsj0-2mix (http://www.merl.com/demos/deep-clustering)
- Daps (https://archive.org/details/daps_dataset)
- Edinburgh-TTS (https://datashare.is.ed.ac.uk/handle/10283/2791)
- PyTorch
- LibRosa
- NumPy
You can simply use the existing config JSON file or customize your config file to train the enhancement or separation model. under the egs/wsj0-2mix/deep_clustering/ directory:
python run.py -c config.json
If you use onssen for your research project, please cite one of the following bibtex citations:
@article{ni2019onssen,
title={Onssen: an open-source speech separation and enhancement library},
author={Ni, Zhaoheng and Mandel, Michael I},
journal={arXiv preprint arXiv:1911.00982},
year={2019}
}
@Misc{onssen,
author = {Zhaoheng Ni and Michael Mandel},
title = "ONSSEN: An Open-source Speech Separation and Enhancement Library",
howpublished = {\url{https://github.com/speechLabBcCuny/onssen}},
year = {2019}
}