Pinned Repositories
alltalk_tts
AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
AutoVocoder
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Bert-VITS2
vits2 backbone with bert
HiFTNet
pysdtw
soft DTW
soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
tacospawn
PyTorch implementation of TacoSpawn, Speaker Generation
unconditional-diff-STFT
Unconditional music synthesis using a diffusion model in the STFT domain
VITS2_pytorch_fork_-p0p4
unofficial VITS vits2-TTS implementation in pytorch
WaveletAttention
Wavelet-Attention CNNs for Image Classification
shaun95's Repositories
shaun95/WaveRNN-M
WaveRNN Vocoder + TTS
shaun95/alltalk_tts
AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
shaun95/Bert-VITS2
vits2 backbone with bert
shaun95/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
shaun95/build-nanogpt
Video+code lecture on building nanoGPT from scratch
shaun95/ChatTTS
ChatTTS is a generative speech model for daily dialogue.
shaun95/fish-speech
Brand new TTS solution
shaun95/genmusic_demo_list
a list of demo websites for automatic music generation research
shaun95/gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
shaun95/llama.cpp
Port of Facebook's LLaMA model in C/C++
shaun95/llama2.c
Inference Llama 2 in one file of pure C
shaun95/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
shaun95/llm.c
LLM training in simple, raw C/CUDA
shaun95/Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
shaun95/metavoice-src
AI for human-level speech intelligence
shaun95/minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
shaun95/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
shaun95/Phi-3CookBook
This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
shaun95/seed-tts-eval
shaun95/sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
shaun95/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
shaun95/snake
SNAKE Inspired by "Neural Networks Fail to Learn Periodic Functions and How to Fix It"
shaun95/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
shaun95/stable-audio-tools
Generative models for conditional audio generation
shaun95/StreamSpeech_SpeechTranslation
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
shaun95/torbi_Viterbi_decoding_in_PyTorch
Viterbi decoding in PyTorch
shaun95/ultravox
shaun95/vector-quantize-pytorch
Vector Quantization, in Pytorch
shaun95/xlstm
Official repository of the xLSTM.
shaun95/yt-dlp
A youtube-dl fork with additional features and fixes