cyanbx's Stars
SuperKogito/SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
ys1305/ML-hand
各种机器学习算法的手写实现
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
zju-vipa/Odyssey
Odyssey: Empowering Agents with Open-World Skills
lucidrains/DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Akshat4112/SpeakerDiff
SpeakerDiff: Denoising Diffusion Probalistic Models on Speaker Embeddings
RickyL-2000/ROSVOT
Robust Singing Voice Transcription and MIDI Extraction
cyanbx/FastLTS
Implementation of FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis (ACM MM'22)
joannahong/Lip2Wav-pytorch
a PyTorch implementation of Lip2Wav
zehanwang01/FreeBind
bytedance/Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
cyanbx/Prompt-Singer
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
roger-tseng/av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
yangdongchao/UniAudio
The Open Source Code of UniAudio
ccfddl/ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
LucidaLu/QAOA-with-fewer-qubits
Data and code repository for "QAOA with fewer qubits: a coupling framework to solve larger-scale Max-Cut problem".
RickyL-2000/AlignSTS
Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment
pengsida/learning_research
本人的科研经验
revsic/torch-nansy
Torch implementation of NANSY, Neural Analysis and Synthesis, arXiv:2110.14513
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Yirui-Wang/ZJU-CSE-Latex
浙江大学控制学院本科生毕业论文Latex模板。
archinetai/audio-ai-timeline
A timeline of the latest AI models for audio generation, starting in 2023!
collabora/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
lyc8503/EasierConnect
NJU EasyConnect 第三方开源 Golang 客户端 / NJU EasyConnect protocol reimplementation in Go
Mythologyli/zju-connect
ZJU RVPN 客户端的 Go 语言实现
Mythologyli/ZJU-Connect-for-Windows
基于 Qt 编写的 ZJU 网络客户端
diff-usion/Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
archinetai/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.