语音增强领域的相关数据仿真工具和方法汇总--持续更新
1:MASG, https://github.com/vipchengrui/MASG
功能实现:房间声学中的麦克风阵列语音发生器(MASG),仿真室内环境下的麦克风阵列采集数据
2:gpuRIR, https://github.com/DavidDiazGuerra/gpuRIR
功能实现:gpu加速的RIR仿真
FAST-RIR: Fast neural diffuse room impulse response generator
FAST-RIR:快速神经扩散房间脉冲响应发生器
链接:https://arxiv.org/abs/2110.04057
作者:Anton Ratnarajah,Shi-Xiong Zhang,Meng Yu,Zhenyu Tang,Dinesh Manocha,Dong Yu
机构: University of Maryland, College Park, MD, USA, Tencent AI Lab, Bellevue, WA, USA
备注:https://anton-jeran.github.io/FRIR/ https://github.com/anton-jeran/FAST-RIR
3:StoRIR,https://github.com/SRPOL-AUI/storir
功能实现:一种不同于RIR的随机房间脉冲响应生成方法,用于机器学习应用中的音频数据增强,不需要知道房间尺寸和声源位置
4:RIR,https://github.com/ehabets/RIR-Generator
功能描述:基于matlab版本的RIR生成算法仿真
5: DNS-Challenge,https://github.com/microsoft/DNS-Challenge
功能实现:微软开源的DNS挑战赛的数据和仿真脚本,可以直接进行数据的仿真,包括加混响和噪声
6:ConferenceSpeech2021,https://github.com/ConferencingSpeech/ConferencingSpeech2021/tree/master/eval
功能实现:腾讯组织的麦克风阵列语音增强比赛,其中的eval可以批处理生成pesq,stoi,si-sdr等客观评价指标,并输出文件,非常有用。
注:非常好的多通道仿真数据生成脚本,可以根据干净语音、噪声和冲击响应生成响应的多通道数据,包括混响参考数据和非混响参考数据,以及对应的带噪语音数据。
1:仿真训练所需数据时,给出干净语音,多通道冲击响应和噪声后,即可仿真生成多通道带噪混响语音;
2:训练时,默认为8通道数据,每个通道6s,配置时train.yaml中val_chun和train_chunk均为6
pyrirgen库备用安装工程:https://github.com/YoungJay0612/rir-generator-pyrirgen
7: AECMOS: A speech quality assessment metric for echo impairment
标题:AECMOS:一种回声损伤的语音质量评估指标
链接:https://arxiv.org/abs/2110.03010
作者:Marju Purin,Sten Sootla,Mateja Sponza,Ando Saabas,Ross Cutler
机构:Microsoft Corporation
link:https://github.com/IMLHF/Speech-Enhancement-Measures
speech enhancement metrics:CSIG, CBAK, CMOS, SSNR, PESQ, STOI, ESTOI, SNR, IS, LLR, WSS
link:https://github.com/kamo-naoyuki/pySIIB
pySIIB: A python implementation of speech intelligibility in bits (SIIB)
Audio generation using diffusion models, in PyTorch.基于diffusion模型的音频生成
link: https://github.com/archinetai/audio-diffusion-pytorch#experiments
link: https://github.com/YoungJay0612/padasip
通过Python进行Adaptive Signal Processing
link: https://github.com/bootphon/shennong
Shennong: A Python toolbox for speech features extraction
link:https://github.com/ZhihaoDU/speech_feature_extractor
GFCC等特征提取
Link: https://github.com/csteinmetz1/auraloss
auraloss: A collection of audio-focused loss functions in PyTorch.
paper title: SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
paper title: SpecMix:一种利用时频域特征训练的混合样本数据增强方法
链接:https://arxiv.org/abs/2108.03020
作者:Gwantae Kim,David K. Han,Hanseok Ko
github: https://github.com/GT-KIM/specmix
机构:Korea University, South Korea, Drexel University, USA
备注:Accepted to interspeech 2021
muda: https://github.com/bmcfee/muda https://muda.readthedocs.io/en/latest/
SpecAugment: https://github.com/DemisEom/SpecAugment
Facebook-Augly: https://github.com/facebookresearch/AugLy
Introc: AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations.
Asteroid-team: https://github.com/asteroid-team/torch-audiomentations
Introc: Audio data augmentation in PyTorch. Inspired by audiomentations.
Audiomentation: https://github.com/iver56/audiomentations
Introc: A Python library for audio data augmentation. Inspired by albumentations(图像数据增强).