MisakaMikoto96's Stars
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
archinetai/audio-ai-timeline
A timeline of the latest AI models for audio generation, starting in 2023!
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
abertsch72/unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
gitmylo/audio-webui
A webui for different audio related Neural Networks
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
gabrielmittag/NISQA
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
gitmylo/bark-voice-cloning-HuBERT-quantizer
The code for the bark-voicecloning model. Training and inference.
clue-ai/PromptCLUE
PromptCLUE, 全中文任务支持零样本学习模型
DmitryRyumin/INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
sail-sg/MDT
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
auspicious3000/contentvec
speech self-supervised representations
cientgu/VQ-Diffusion
dome272/MaskGIT-pytorch
Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)
lochenchou/MOSNet
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Rongjiehuang/GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
yangdongchao/SoundStorm
The reproduced code for Google's SoundStorm
yangkevin2/emnlp22-re3-story-generation
rishikksh20/SoundStorm-pytorch
Google's SoundStorm: Efficient Parallel Audio Generation
Moon0316/T2A
Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023
ga642381/SpeechGen
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
lifeiteng/SoundStorm
cpdu/unicats
elevenlabs/elevenlabs-docs
Documentation for elevenlabs.io/docs
hekaijie123/TATrack
Target-Aware Tracking with Long-term Context Attention
gitmylo/bark-data-gen
Create training data for training a voice cloner for bark text to speech.