Pinned Repositories
AutoVocoder
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models
Bert-VITS2
vits2 backbone with bert
bwe_historical_recordings-fork-
Bandwidth Extension of Historical Recordings using Generative Adversarial Networks ( BEHM-GAN )
golf
A DDSP-based neural vocoder.
soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
tacospawn
PyTorch implementation of TacoSpawn, Speaker Generation
unconditional-diff-STFT
Unconditional music synthesis using a diffusion model in the STFT domain
VITS2_pytorch_fork_-p0p4
unofficial VITS vits2-TTS implementation in pytorch
WaveletAttention
Wavelet-Attention CNNs for Image Classification
shaun95's Repositories
shaun95/Bert-VITS2
vits2 backbone with bert
shaun95/golf
A DDSP-based neural vocoder.
shaun95/AAREfficient-Autoregressive-Audio-Modeling-via-Next-Scale-Prediction
[Official Implementation] Acoustic Autoregressive Modeling 🔥
shaun95/annotated_deep_learning_paper_implementations
🧑🏫 59 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
shaun95/audio-inpainting-diffusion
shaun95/AudioSep
Official implementation of "Separate Anything You Describe"
shaun95/BigVGAN-NVIDIA
Official implementation of BigVGAN in PyTorch
shaun95/diffsptk
A differential version of SPTK
shaun95/duplex-model
shaun95/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
shaun95/e2_tts
shaun95/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
shaun95/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
shaun95/MagVITS
VITS with phoneme-level prosody modeling based on MaskGIT
shaun95/mamba
shaun95/Matcha-TTS
🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
shaun95/nano-llama31
nanoGPT style version of Llama 3.1
shaun95/PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
shaun95/pipecat_framework-for-voice-and-multimodal-conversational-AI
Open Source framework for voice and multimodal conversational AI
shaun95/polymath_music_separation
Convert any music library into a music production sample-library with ML
shaun95/rfwave_vocoder
shaun95/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity and Number Detector
shaun95/taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
shaun95/tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
shaun95/TriAAN-VC
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
shaun95/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
shaun95/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
shaun95/xlstm_fork
Pytorch implementation of the xLSTM model by Beck et al. (2024)
shaun95/yet-another-retnet
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (https://arxiv.org/pdf/2307.08621.pdf)
shaun95/yt-dlp
A youtube-dl fork with additional features and fixes