Pinned Repositories
adsv_voting
algorithm_analysis_practice
algorithm analysis practice
asv-subtools
An Open Source Tools for Speaker Recognition
BJFUThesis
北京林业大学 LaTeX 学位论文模板 LaTeX Thesis Template for Beijing Forestry University
BlogManger
CAT
A CRF-based ASR Toolkit
Crawler
a simple crawler by java
CustomerManagement
a simple customer management system using MVC model with development ducumentation :bowtie:
d-vector-language-recognition
d-vector based language identification using pytorch
mmam_loss
mmam loss for language identification
hangxiu's Repositories
hangxiu/mmam_loss
mmam loss for language identification
hangxiu/adsv_voting
hangxiu/asv-subtools
An Open Source Tools for Speaker Recognition
hangxiu/BJFUThesis
北京林业大学 LaTeX 学位论文模板 LaTeX Thesis Template for Beijing Forestry University
hangxiu/CAT
A CRF-based ASR Toolkit
hangxiu/d-vector-language-recognition
d-vector based language identification using pytorch
hangxiu/DNF
A Pytorch implementations of DNF for author's article "Deep normalization for speaker vectors"
hangxiu/EER_and_minDCF
calculate EER and minDCF
hangxiu/mce2018
An implementation of an open-set speaker recognition system for the 1st Multi-target speaker detection and identification Challenge Evaluation (MCE 2018)
hangxiu/meta-SR
Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)
hangxiu/MetricEmbeddingNet
A collection of my research on metric loss learning, speaker embedding, identification, diarisation and more. Visit lucvanwyk.wixsite.com/website for details.
hangxiu/netvlad-in-speech
An end-to-end framework using NetVLAD for language identification and speaker recognition.
hangxiu/OpenTransformer
A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
hangxiu/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
hangxiu/pytorch-metric-learning
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
hangxiu/se_relative_loss
(tensorflow) speech enhancement using relative loss
hangxiu/SpeakerRecognition_tutorial
Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
hangxiu/SpecAugmentPyTorch
A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
hangxiu/Speech-Enhancement-Measures
speech enhancement metrics:CSIG, CBAK, CMOS, SSNR, PESQ, STOI, ESTOI, SNR, IS, LLR, WSS
hangxiu/Speech_Signal_Processing_and_Classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
hangxiu/speechbrain
A PyTorch-based Speech Toolkit
hangxiu/StreamingSpeakerDiarization
Lightweight python library for speaker diarization in real time implemented in pytorch
hangxiu/tf-kaldi-speaker
Neural speaker recognition/verification system based on Kaldi and Tensorflow
hangxiu/ulaw-SGAN-for-SE
ulaw SGAN for speech enhancemen
hangxiu/VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
hangxiu/Voice-Authentication-Pytorch-Densenet121
A simple Voice authentication system using Densenet
hangxiu/voxceleb_trainer
In defence of metric learning for speaker recognition
hangxiu/voxceleb_unsupervised
Baseline for the VoxSRC 2020 self-supervised speaker verification
hangxiu/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
hangxiu/WINVC
Official implementation of "WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization".