Pinned Repositories
3D-Speaker
A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.
adaptive_voice_conversion
advoc
Vocode spectrograms to audio with generative adversarial networks
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
annotated_deep_learning_paper_implementations
🧑🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Fay
语音互动,直播自动带货 虚拟数字人
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
OpenVoice
Instant voice cloning
spleeter
Deezer source separation library including pretrained models.
TTS-frontend
TTS-frontend with Bert and CRF/lstm (For Tacotron)
macroustc's Repositories
macroustc/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
macroustc/Awesome-Talking-Face
📖 A curated list of resources dedicated to talking face.
macroustc/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
macroustc/Awesome-Video-Diffusion-Models
[Arxiv] A Survey on Video Diffusion Models
macroustc/ChatTTS
ChatTTS is a generative speech model for daily dialogue.
macroustc/Codecfake
This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".
macroustc/CosyVoice
LLM based TTS model, providing inference/training/deployment full-stack ability.
macroustc/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
macroustc/DeepLearningSystem
Deep Learning System core principles introduction.
macroustc/Diff-Foley
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
macroustc/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
macroustc/Draw-an-Audio-Code
Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.
macroustc/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
macroustc/fish-speech
Brand new TTS solution
macroustc/g2pW
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
macroustc/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
macroustc/llm-paper-daily
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
macroustc/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
macroustc/minisora
The Mini Sora project aims to explore the implementation path and future development direction of Sora.
macroustc/moshi
A modern JSON library for Kotlin and Java.
macroustc/Open-Sora
Building your own video generation model like OpenAI's Sora
macroustc/Open-Sora-Plan
This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
macroustc/punctuator
A small seq2seq punctuator tool based on DistilBERT
macroustc/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
macroustc/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
macroustc/SenseVoice
Multilingual Voice Understanding Model
macroustc/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
macroustc/TeleSpeech-ASR
macroustc/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
macroustc/yt-dlp
A feature-rich command-line audio/video downloader