Pinned Repositories
1-billion-word-language-modeling-benchmark
Formerly known as code.google.com/p/1-billion-word-language-modeling-benchmark
3D-convolutional-speaker-recognition
:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
3D-R2N2
Single/multi view image(s) to voxel reconstruction using a recurrent neural network
3d_face_gcns
pytorch implementation of 3d_face_gcn + audioDVP paper
4D-Facial-Avatars
Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction
955.WLB
955 不加班的公司名单
996.ICU
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
MultiToneNet
recorder
ToneNet
ToneNet: A CNN Model of Tone Classification of Mandarin Chinese
saber5433's Repositories
saber5433/agc
Audiogen Codec
saber5433/aimoneyhunter
ai副业赚钱大集合,教你如何利用ai做一些副业项目,赚取更多额外收益。The Ultimate Guide to Making Money with AI Side Hustles: Learn how to leverage AI for some cool side gigs and rake in some extra cash. Check out the English version for more insights.
saber5433/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
saber5433/AudioLDM2
Text-to-Audio/Music Generation
saber5433/Bert-VITS2
vits2 backbone with multilingual-bert
saber5433/CLAP
Contrastive Language-Audio Pretraining
saber5433/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
saber5433/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
saber5433/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
saber5433/GeneFacePlusPlus
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
saber5433/Genshin_Datasets
Genshin Datasets For SVC/SVS/TTS
saber5433/istft-onnx
Export an ONNX graph that performs ISTFT. Designed for TTS models.
saber5433/LangSegment
It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。
saber5433/LibriTTS-P
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
saber5433/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
saber5433/Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
saber5433/mamba
Mamba SSM architecture
saber5433/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
saber5433/megatts2
Unoffical implementation of Megatts2
saber5433/minbpe
Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
saber5433/open_clip
An open source implementation of CLIP.
saber5433/OpenAI-CLIP
Simple implementation of OpenAI CLIP model in PyTorch.
saber5433/SimpleSpeech
The open source code for SimpleSpeech series
saber5433/StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
saber5433/super-monotonic-align
saber5433/Supercodec
saber5433/TextrolSpeech
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
saber5433/tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
saber5433/UniCATS-CTX-txt2vec
[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS
saber5433/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis