YUCHEN005
Ph.D. student at NTU, research focus on speech, multimodal and LLMs.
Nanyang Technological UniversitySingapore
Pinned Repositories
DPSL-ASR
Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
GenTranslate
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
GILA
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
Gradient-Remedy
Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
MIR-GAN
Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition"
NASE
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
RobustGER
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
STAR-Adapt
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
Unified-Enhance-Separation
Code for paper "Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation"
UniVPM
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
YUCHEN005's Repositories
YUCHEN005/STAR-Adapt
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
YUCHEN005/GenTranslate
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
YUCHEN005/RobustGER
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
YUCHEN005/NASE
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
YUCHEN005/Unified-Enhance-Separation
Code for paper "Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation"
YUCHEN005/DPSL-ASR
Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
YUCHEN005/UniVPM
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
YUCHEN005/GILA
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
YUCHEN005/Gradient-Remedy
Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
YUCHEN005/MIR-GAN
Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition"
YUCHEN005/RATS-Channel-A-Speech-Data
This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log-Mel Fbank features and several raw wavform listening samples.
YUCHEN005/UNA-GAN
Code for paper "Unsupervised Noise adaptation using Data Simulation"
YUCHEN005/RIO-TTS-demos
YUCHEN005/UNO-TTS-demos
YUCHEN005/UNA-GAN-Demo
YUCHEN005/Hypo2Trans
Single-blind supplementary materials for NeurIPS 2023 submission
YUCHEN005/yuchen005.github.io
AcadHomepage: A Modern and Responsive Academic Personal Homepage