Pinned Repositories
AutoVocoder
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models
diffusion-audio-restoration-nvidia-SR
Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.
F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
golf_diff_Glottal_Flow_LPC_synthesis
A DDSP-based neural vocoder.
MB-iSTFT-VITS2_super-monotonic-align
Application of MB-iSTFT-VITS components to vits2_pytorch
tacospawn
PyTorch implementation of TacoSpawn, Speaker Generation
unconditional-diff-STFT
Unconditional music synthesis using a diffusion model in the STFT domain
WaveletAttention
Wavelet-Attention CNNs for Image Classification
SynthAether's Repositories
SynthAether/eben
Repo for source code of EBEN: Extreme Bandwidth Extension Network
SynthAether/Large-Audio-Models
Keep track of big models in audio domain, including speech, singing, music etc.
SynthAether/OpenVoice
Instant voice cloning by MyShell
SynthAether/alltalk_tts
AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
SynthAether/asmgen_SIMD
Generator for select AVX/AVX2/FMA/AVX512/NEON/SVE/RVV inline assembly instructions for use with C/C++
SynthAether/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
SynthAether/basic-pitch
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
SynthAether/build-nanogpt
Video+code lecture on building nanoGPT from scratch
SynthAether/DDSP-SVC
End-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
SynthAether/denoising-diffusion-pytorch
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
SynthAether/DEX-TTS
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
SynthAether/genmusic_demo_list
a list of demo websites for automatic music generation research
SynthAether/LLaMA-Factory
Unify Efficient Fine-tuning of 100+ LLMs
SynthAether/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
SynthAether/llm.c
LLM training in simple, raw C/CUDA
SynthAether/NeMo
NeMo: a toolkit for conversational AI
SynthAether/normalizing-flows
PyTorch implementation of normalizing flow models
SynthAether/RWKV-LM
RWKV is a RNN with transformer-level performance. It can be directly trained like a GPT transformer (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
SynthAether/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
SynthAether/seed-tts-eval
SynthAether/sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
SynthAether/sgmse_Speech-Enhancement-and-Dereverberation-with-Diffusion-based-Generative-Models
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
SynthAether/stable-audio-tools
Generative models for conditional audio generation
SynthAether/STFT
[c++]STFT, ISTFT, mel-filterbank modules
SynthAether/StreamSpeech_SpeechTranslation
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
SynthAether/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
SynthAether/torbi_Viterbi_decoding_in_PyTorch
Viterbi decoding in PyTorch
SynthAether/torchlpc_LPC
LPC with Pytoch
SynthAether/utmos
A toolkit to calculate speech audio quality. Not affiliated with the original authors
SynthAether/x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers