campustian

campustian's Stars

THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Language:Python41k 393 1.3k5.2k
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
Language:Python26.8k 181 1295k
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook11.4k 147 3811.1k
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python8.9k 81 256693
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Language:Python8.7k 158 5441.1k
nICEnnnnnnnLee/BilibiliDown
(GUI-多平台支持) B站哔哩哔哩视频下载器。支持稍后再看、收藏夹、UP主视频批量下载|Bilibili Video Downloader 😳
Language:Java3.2k 28 212311
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
Language:Python2.8k 30 146230
Plachtaa/seed-vc
zero-shot voice conversion & singing voice conversion, with real-time support
Language:Python2.1k 40 132232
gnobitab/RectifiedFlow
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
Language:Python1.2k 11 2664
sodaling/FastestBilibiliDownloader
B站视频极速批量下载器|The fastest Bilibili video downloader
Language:Go706 14 27108
yangdongchao/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Language:Python606 25 4075
sp-uhh/sgmse
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
Language:Python586 11 6782
zhenye234/LLaSA_training
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Language:Python483 22 3034
dome272/MaskGIT-pytorch
Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)
Language:Python430 15 1835
yxlu-0102/MP-SENet
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
Language:Python370 7 6360
yangdongchao/Text-to-sound-Synthesis
The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
Language:Python357 16 2734
jishengpeng/WavChat
A Survey of Spoken Dialogue Models (60 pages)
284 8 716
thuhcsi/NeuCoSVC
Language:Python276 6 941
lucidrains/rectified-flow-pytorch
Implementation of rectified flow and some of its followup research / improvements in Pytorch
Language:Python265 9 1012
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Language:Python242 16 4348
juansgomez87/datasets_emotion
This repository collects information about different data sets for Music Emotion Recognition.
235 2 224
sarulab-speech/jtubespeech
Language:Python217 10 947
Plachtaa/FAcodec
Training code for FAcodec presented in NaturalSpeech3
Language:Python196 10 2822
leimao/PyTorch-Quantization-Aware-Training
PyTorch Quantization Aware Training Example
Language:Python131 5 535
Text-to-Audio/Make-An-Audio-3
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
Language:Python94 4 35
3loi/NaturalVoices
Language:Jupyter Notebook51 4 03
Labbeti/aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
Language:Python43 3 113
minguinho26/Prefix_AAC_ICASSP2023
Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"
Language:Jupyter Notebook29 2 52
xmusic-project/XMIDI_Dataset
XMIDI Dataset: A large-scale symbolic music dataset with emotion and genre labels.
18 1 0
xieh97/dcase2023-audio-retrieval
Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2023 Challenge
Language:Python9 2 03