MisakaMikoto96

Meow~ | Text-to-speech | USA

the University of Edinburgh常盘台

MisakaMikoto96's Stars

facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python20.8k 202 3812.1k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.8k 114 1.1k1.3k
yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
Language:Jupyter Notebook2.9k 32 183359
archinetai/audio-ai-timeline
A timeline of the latest AI models for audio generation, starting in 2023!
1.9k 170 468
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Language:Python1.4k 52 2185
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language:Python1.2k 24 86112
abertsch72/unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
Language:Python1.1k 23 6080
gitmylo/audio-webui
A webui for different audio related Neural Networks
Language:Python1.1k 22 19299
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Language:Python786 33 4689
gabrielmittag/NISQA
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Language:Python668 25 46117
gitmylo/bark-voice-cloning-HuBERT-quantizer
The code for the bark-voicecloning model. Training and inference.
Language:Python655 18 43109
clue-ai/PromptCLUE
PromptCLUE, 全中文任务支持零样本学习模型
Language:Jupyter Notebook650 9 1968
DmitryRyumin/INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
616 87 442
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
Language:Python532 32 2844
sail-sg/MDT
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
Language:Python517 17 5138
auspicious3000/contentvec
speech self-supervised representations
Language:Python462 11 3036
cientgu/VQ-Diffusion
Language:Python436 6 3043
dome272/MaskGIT-pytorch
Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)
Language:Python405 15 1834
lochenchou/MOSNet
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Language:Python335 10 1061
Rongjiehuang/GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
Language:Python316 17 2845
yangdongchao/SoundStorm
The reproduced code for Google's SoundStorm
Language:Python247 20 2719
yangkevin2/emnlp22-re3-story-generation
Language:Python245 12 447
rishikksh20/SoundStorm-pytorch
Google's SoundStorm: Efficient Parallel Audio Generation
Language:Python124 17 513
Moon0316/T2A
Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023
Language:Python82 5 811
ga642381/SpeechGen
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
74 8 25
lifeiteng/SoundStorm
70 16 04
cpdu/unicats
62 14 11
elevenlabs/elevenlabs-docs
Documentation for elevenlabs.io/docs
Language:MDX60 17 174262
hekaijie123/TATrack
Target-Aware Tracking with Long-term Context Attention
Language:Python48 3 221
gitmylo/bark-data-gen
Create training data for training a voice cloner for bark text to speech.
Language:Jupyter Notebook44 3 410

MisakaMikoto96

MisakaMikoto96's Stars

facebookresearch/audiocraft

Dao-AILab/flash-attention

yuanzhoulvpi2017/zero_nlp

archinetai/audio-ai-timeline

lucidrains/soundstorm-pytorch

microsoft/SpeechT5

abertsch72/unlimiformer

gitmylo/audio-webui

gemelo-ai/vocos

gabrielmittag/NISQA

gitmylo/bark-voice-cloning-HuBERT-quantizer

clue-ai/PromptCLUE

DmitryRyumin/INTERSPEECH-2023-Papers

facebookresearch/AudioMAE

sail-sg/MDT

auspicious3000/contentvec

cientgu/VQ-Diffusion

dome272/MaskGIT-pytorch

lochenchou/MOSNet

Rongjiehuang/GenerSpeech

yangdongchao/SoundStorm

yangkevin2/emnlp22-re3-story-generation

rishikksh20/SoundStorm-pytorch

Moon0316/T2A

ga642381/SpeechGen

lifeiteng/SoundStorm

cpdu/unicats

elevenlabs/elevenlabs-docs

hekaijie123/TATrack

gitmylo/bark-data-gen