XiaoYuanJun-zy's Stars
RanaCM/DSU-AVO
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
AI-S2-Lab/FluentEditor
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
walker-hyf/ECSS
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)
pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
ahmetbersoz/chatgpt-prompts-for-academic-writing
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
ZebangCheng/Emotion-LLaMA
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Labmem-Zhouyx/CDFSE_FastSpeech2
The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”
PeiranLi0930/L-SVD
Large-Scale Selfie Video Dataset (L-SVD): A Benchmark for Emotion Recognition
BladeDancer957/DualGATs
Code for ACL2023 paper 《DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations》
PetarV-/GAT
Graph Attention Networks (https://arxiv.org/abs/1710.10903)
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Chris10M/Lip2Speech
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
dingchaoyue/AcFormer
983632847/All-in-One
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
wyang-vis/EIFNet
Event-based Motion Deblurring with Modality-Aware Decomposition and Recomposition
Jay1Zhang/AVFAS
DreamMr/EST
Expression Snippet Transformer for Robust Video-based Facial Expression Recognition
nku-zhichengzhang/CTEN
[CVPR 2023] This is the official implementation of "Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network"
sunlicai/MAE-DFER
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition (ACM MM 2023)
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
fengdu78/deeplearning_ai_books
deeplearning.ai(吴恩达老师的深度学习课程笔记及资源)
GalaxyCong/StyleDubber
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
VikParuchuri/texify
Math OCR model that outputs LaTeX and markdown
kornia/kornia
Geometric Computer Vision Library for Spatial AI
JeongHun0716/vsr-low
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
YasserdahouML/VSR_test_set
WildVSR
facebookresearch/av_hubert
A self-supervised learning framework for audio-visual speech