nukes's Stars
LAION-AI/natural_voice_assistant
fixie-ai/ultravox
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
huggingface/parler-tts
Inference and training library for high-quality TTS models.
karpathy/llm.c
LLM training in simple, raw C/CUDA
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
mekumiao/ssml-editor
基于wangeditor实现的支持SSML语法的编辑器
Vaibhavs10/insanely-fast-whisper
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
AppFlowy-IO/AppFlowy
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.
labmlai/annotated_deep_learning_paper_implementations
🧑🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
princeton-nlp/LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
X-LANCE/UniCATS-CTX-vec2wav
[AAAI 2024] Code for CTX-vec2wav in UniCATS
collabora/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
google-research/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
modelscope/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
haoheliu/versatile_audio_super_resolution
Versatile audio super resolution (any -> 48kHz) with AudioSR.
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
kakaobrain/magvlt
The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)
yangdongchao/UniAudio
The Open Source Code of UniAudio
hyn2028/llm-cxr
Official code for "LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation"
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
dvlab-research/LongLoRA
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
hiyouga/LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
YuanGongND/whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"