YUCHEN005
Ph.D. student at NTU, research focus on speech, multimodal and LLMs.
Nanyang Technological UniversitySingapore
YUCHEN005's Stars
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
academicpages/academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
open-mmlab/FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
lochenchou/MOSNet
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
YUCHEN005/STAR-Adapt
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
YUCHEN005/GenTranslate
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
YUCHEN005/RobustGER
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
Hypotheses-Paradise/Hypo2Trans
Single-blind supplementary materials for NeurIPS 2023 submission
shikiw/Modality-Integration-Rate
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
YUCHEN005/NASE
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
shikiw/DAM-VP
[CVPR 2023] Diversity-Aware Meta Visual Prompting
shikiw/Awesome-MLLM-Hallucination
Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)
shikiw/SI-Adv
[CVPR 2022] Shape-invariant Adversarial Point Clouds
YUCHEN005/Unified-Enhance-Separation
Code for paper "Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation"
YUCHEN005/DPSL-ASR
Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
soumimaiti/speechlmscore_tool
YUCHEN005/UniVPM
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
YUCHEN005/GILA
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
YUCHEN005/Gradient-Remedy
Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
YUCHEN005/MIR-GAN
Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition"
YUCHEN005/RATS-Channel-A-Speech-Data
This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log-Mel Fbank features and several raw wavform listening samples.
Hypotheses-Paradise/UADF
YUCHEN005/UNA-GAN
Code for paper "Unsupervised Noise adaptation using Data Simulation"
shirley-wu/daco
[NeurIPS 2024 D&B Track] DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation