Pinned Repositories
3D-Speaker
A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.
adaptive_voice_conversion
advoc
Vocode spectrograms to audio with generative adversarial networks
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
annotated_deep_learning_paper_implementations
🧑🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Fay
语音互动,直播自动带货 虚拟数字人
OpenVoice
Instant voice cloning
spleeter
Deezer source separation library including pretrained models.
TTS-frontend
TTS-frontend with Bert and CRF/lstm (For Tacotron)
macroustc's Repositories
macroustc/OpenVoice
Instant voice cloning
macroustc/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
macroustc/audino
Open source audio annotation tool for humans
macroustc/Awesome-Talking-Face
📖 A curated list of resources dedicated to talking face.
macroustc/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
macroustc/Awesome-Video-Diffusion-Models
[Arxiv] A Survey on Video Diffusion Models
macroustc/Bert-VITS2
vits2 backbone with bert
macroustc/ChatTTS
ChatTTS is a generative speech model for daily dialogue.
macroustc/DeepLearningSystem
Deep Learning System core principles introduction.
macroustc/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
macroustc/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
macroustc/fish-speech
Brand new TTS solution
macroustc/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
macroustc/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
macroustc/LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
macroustc/llm-paper-daily
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
macroustc/minisora
The Mini Sora project aims to explore the implementation path and future development direction of Sora.
macroustc/NISQA
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
macroustc/Open-Sora
Building your own video generation model like OpenAI's Sora
macroustc/Open-Sora-Plan
This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
macroustc/piper
A fast, local neural text to speech system
macroustc/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
macroustc/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
macroustc/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
macroustc/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
macroustc/Speech-Resources
语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
macroustc/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
macroustc/UniAudio
The Open Source Code of UniAudio
macroustc/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
macroustc/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild