campustian's Stars
THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
nICEnnnnnnnLee/BilibiliDown
(GUI-多平台支持) B站 哔哩哔哩 视频下载器。支持稍后再看、收藏夹、UP主视频批量下载|Bilibili Video Downloader 😳
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
Plachtaa/seed-vc
zero-shot voice conversion & singing voice conversion, with real-time support
gnobitab/RectifiedFlow
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
sodaling/FastestBilibiliDownloader
B站视频极速批量下载器|The fastest Bilibili video downloader
yangdongchao/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
sp-uhh/sgmse
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
zhenye234/LLaSA_training
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
dome272/MaskGIT-pytorch
Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)
yxlu-0102/MP-SENet
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
yangdongchao/Text-to-sound-Synthesis
The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
jishengpeng/WavChat
A Survey of Spoken Dialogue Models (60 pages)
thuhcsi/NeuCoSVC
lucidrains/rectified-flow-pytorch
Implementation of rectified flow and some of its followup research / improvements in Pytorch
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
juansgomez87/datasets_emotion
This repository collects information about different data sets for Music Emotion Recognition.
sarulab-speech/jtubespeech
Plachtaa/FAcodec
Training code for FAcodec presented in NaturalSpeech3
leimao/PyTorch-Quantization-Aware-Training
PyTorch Quantization Aware Training Example
Text-to-Audio/Make-An-Audio-3
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
3loi/NaturalVoices
Labbeti/aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
minguinho26/Prefix_AAC_ICASSP2023
Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"
xmusic-project/XMIDI_Dataset
XMIDI Dataset: A large-scale symbolic music dataset with emotion and genre labels.
xieh97/dcase2023-audio-retrieval
Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2023 Challenge