Pinned Repositories
100-Days-Of-ML-Code
100-Days-Of-ML-Code中文版
996.ICU
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
ai-audio-datasets-list
This is a list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications. It is mainly used for speech recognition, speech synthesis, singing voice synthesis, music information retrieval, music generation, etc.
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
ATen
ATen: A TENsor library for C++11
athena
an open-source implementation of sequence-to-sequence based speech processing engine
audio-ai-timeline
A timeline of the latest AI models for audio generation, starting in 2023!
pan310's Repositories
pan310/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
pan310/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
pan310/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
pan310/AudioSR-Upsampling
AudioSR-Upsampling (any -> 48kHz)
pan310/brouhaha-vad
Predicts the level of noise and reverberation on your audiofiles
pan310/ChatTTS
ChatTTS is a generative speech model for daily dialogue.
pan310/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
pan310/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
pan310/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
pan310/detail_tts
All generative model in one for better TTS model
pan310/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
pan310/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
pan310/fish-speech
Brand new TTS solution
pan310/flash-attention
Fast and memory-efficient exact attention
pan310/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
pan310/grok-1
Grok open release
pan310/KazEmoTTS
An open-source Kazakh Emotional Text-to-Speech Dataset
pan310/MahaTTS
pan310/megatts2
Unoffical implementation of Megatts2
pan310/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
pan310/NAST
Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11037
pan310/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
pan310/OpenVoice
Instant voice cloning by MyShell.
pan310/parler-tts
Inference and training library for high-quality TTS models.
pan310/PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
pan310/speech-to-speech
pan310/UniCATS-CTX-txt2vec
[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS
pan310/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
pan310/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
pan310/voicefixer
General Speech Restoration