Pinned Repositories
Auffusion
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
BUPT-Projects
Some works for course in BUPT
BUPT_grs_lesson
北邮BUPT抢课脚本(研究生)
Comprehensive-Transformer-TTS
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
data-struct-oj
数据结构课oj
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
happylittlecat2333.github.io
MOSNet
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
happylittlecat2333's Repositories
happylittlecat2333/Auffusion
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
happylittlecat2333/FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
happylittlecat2333/happylittlecat2333.github.io
happylittlecat2333/MOSNet
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
happylittlecat2333/BUPT-Projects
Some works for course in BUPT
happylittlecat2333/BUPT_grs_lesson
北邮BUPT抢课脚本(研究生)
happylittlecat2333/Comprehensive-Transformer-TTS
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
happylittlecat2333/data-struct-oj
数据结构课oj
happylittlecat2333/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
happylittlecat2333/espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
happylittlecat2333/github-pages-demo
happylittlecat2333/gitignore
A collection of useful .gitignore templates
happylittlecat2333/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
happylittlecat2333/icassp2023
happylittlecat2333/interspeech2022
happylittlecat2333/interspeech2024
happylittlecat2333/interspeech2024-RAG
happylittlecat2333/iscslp2022
happylittlecat2333/mm2023
happylittlecat2333/pokemon
happylittlecat2333/ppt-to-txt
convert .pptx file to .txt file
happylittlecat2333/pytorch_xvectors
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
happylittlecat2333/travel_simulation
happylittlecat2333/TTS-1
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production