Pinned Repositories
Whisper-Rocognition
avsr-chain
Audio-Visual Speech Recognition using chain model in kaldi toolkit
DCM_vgg_transformer
Dual cross modality attention audio-visual speech recognition model based on vgg transformer with hybrid CTC/attention architecture using fairseq
dual_cross_modality-AVSR
The audio visual speech recognition model which dual cross modality attention based on sigmedia-AVSR code
kaldi
This is the official location of the Kaldi project.
PerceptualAudio
Perceptual Metrics of Audio - perceptually relevant loss function. DPAM and CDPAM
pytorch-tutorial
PyTorch Tutorial for Deep Learning Researchers
PhonMatchNet
Official implementation of "PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords" (INTERSPEECH 2023)
LeeYongHyeok's Repositories
LeeYongHyeok/DCM_vgg_transformer
Dual cross modality attention audio-visual speech recognition model based on vgg transformer with hybrid CTC/attention architecture using fairseq
LeeYongHyeok/avsr-chain
Audio-Visual Speech Recognition using chain model in kaldi toolkit
LeeYongHyeok/dual_cross_modality-AVSR
The audio visual speech recognition model which dual cross modality attention based on sigmedia-AVSR code
LeeYongHyeok/kaldi
This is the official location of the Kaldi project.
LeeYongHyeok/PerceptualAudio
Perceptual Metrics of Audio - perceptually relevant loss function. DPAM and CDPAM
LeeYongHyeok/pytorch-tutorial
PyTorch Tutorial for Deep Learning Researchers