YUCHEN005

Ph.D. student at NTU, research focus on speech, multimodal and LLMs.

Nanyang Technological UniversitySingapore

YUCHEN005's Stars

hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Language:Python37.5k 220 5.6k4.6k
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python21.3k 213 3982.2k
academicpages/academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Language:JavaScript12.9k 92 37644.8k
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook11.1k 144 3701.1k
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python9.3k 82 667890
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Language:Jupyter Notebook8k 89 135773
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
Language:Python1.8k 23 106183
open-mmlab/FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝
Language:Python501 16 2242
lochenchou/MOSNet
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Language:Python349 10 1064
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Language:Python300 3 4826
YUCHEN005/STAR-Adapt
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
Language:Python297 2 23
YUCHEN005/GenTranslate
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
Language:Python228 7 57
YUCHEN005/RobustGER
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
Language:Python158 6 43
Hypotheses-Paradise/Hypo2Trans
Single-blind supplementary materials for NeurIPS 2023 submission
Language:Python96 7 25
shikiw/Modality-Integration-Rate
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
Language:Python92 3 33
YUCHEN005/NASE
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
Language:Python84 3 62
shikiw/DAM-VP
[CVPR 2023] Diversity-Aware Meta Visual Prompting
Language:Python81 3 65
shikiw/Awesome-MLLM-Hallucination
Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)
72 1 01
shikiw/SI-Adv
[CVPR 2022] Shape-invariant Adversarial Point Clouds
Language:Python44 4 610
YUCHEN005/Unified-Enhance-Separation
Code for paper "Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation"
Language:Python42 2 27
YUCHEN005/DPSL-ASR
Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
Language:Python39 2 64
soumimaiti/speechlmscore_tool
Language:Python28 4 12
YUCHEN005/UniVPM
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
Language:Python21 1 41
YUCHEN005/GILA
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
Language:Python19 1 40
YUCHEN005/Gradient-Remedy
Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
Language:Python16 2 11
YUCHEN005/MIR-GAN
Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition"
Language:Python16 2 21
YUCHEN005/RATS-Channel-A-Speech-Data
This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log-Mel Fbank features and several raw wavform listening samples.
14 2 10
Hypotheses-Paradise/UADF
Language:Python12 1 20
YUCHEN005/UNA-GAN
Code for paper "Unsupervised Noise adaptation using Data Simulation"
Language:Python12 1 00
shirley-wu/daco
[NeurIPS 2024 D&B Track] DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Language:Python5 2 00

YUCHEN005

YUCHEN005's Stars

hiyouga/LLaMA-Factory

facebookresearch/audiocraft

academicpages/academicpages.github.io

facebookresearch/seamless_communication

FunAudioLLM/CosyVoice

jasonppy/VoiceCraft

Vchitect/Latte

open-mmlab/FoleyCrafter

lochenchou/MOSNet

shikiw/OPERA

YUCHEN005/STAR-Adapt

YUCHEN005/GenTranslate

YUCHEN005/RobustGER

Hypotheses-Paradise/Hypo2Trans

shikiw/Modality-Integration-Rate

YUCHEN005/NASE

shikiw/DAM-VP

shikiw/Awesome-MLLM-Hallucination

shikiw/SI-Adv

YUCHEN005/Unified-Enhance-Separation

YUCHEN005/DPSL-ASR

soumimaiti/speechlmscore_tool

YUCHEN005/UniVPM

YUCHEN005/GILA

YUCHEN005/Gradient-Remedy

YUCHEN005/MIR-GAN

YUCHEN005/RATS-Channel-A-Speech-Data

Hypotheses-Paradise/UADF

YUCHEN005/UNA-GAN

shirley-wu/daco