XiaoYuanJun-zy

XiaoYuanJun-zy's Stars

RanaCM/DSU-AVO
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
Language:Python121
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language:Python62733
AI-S2-Lab/FluentEditor
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
Language:Python442
walker-hyf/ECSS
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)
Language:Python484
pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Language:HTML47068
ahmetbersoz/chatgpt-prompts-for-academic-writing
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
2.8k231
ZebangCheng/Emotion-LLaMA
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Language:Python695
Labmem-Zhouyx/CDFSE_FastSpeech2
The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”
Language:Python8012
PeiranLi0930/L-SVD
Large-Scale Selfie Video Dataset (L-SVD): A Benchmark for Emotion Recognition
40743
BladeDancer957/DualGATs
Code for ACL2023 paper 《DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations》
Language:Python5812
PetarV-/GAT
Graph Attention Networks (https://arxiv.org/abs/1710.10903)
Language:Python3.2k642
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Language:Python4.7k382
Chris10M/Lip2Speech
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
Language:Python7319
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python4.7k474
dingchaoyue/AcFormer
Language:Python191
983632847/All-in-One
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Language:Python12
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
5.9k842
wyang-vis/EIFNet
Event-based Motion Deblurring with Modality-Aware Decomposition and Recomposition
Language:Python5
Jay1Zhang/AVFAS
Language:Python2
DreamMr/EST
Expression Snippet Transformer for Robust Video-based Facial Expression Recognition
Language:Python122
nku-zhichengzhang/CTEN
[CVPR 2023] This is the official implementation of "Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network"
Language:Python30
sunlicai/MAE-DFER
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition (ACM MM 2023)
Language:Python8912
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Language:Python21118
fengdu78/deeplearning_ai_books
deeplearning.ai（吴恩达老师的深度学习课程笔记及资源）
Language:HTML17.9k5.9k
GalaxyCong/StyleDubber
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
Language:Python322
VikParuchuri/texify
Math OCR model that outputs LaTeX and markdown
Language:Python78861
kornia/kornia
Geometric Computer Vision Library for Spatial AI
Language:Python9.8k956
JeongHun0716/vsr-low
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
Language:Python8
YasserdahouML/VSR_test_set
WildVSR
Language:Python12
facebookresearch/av_hubert
A self-supervised learning framework for audio-visual speech
Language:Python830130