audio-generation
There are 82 repositories under audio-generation topic.
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
haoheliu/AudioLDM2
Text-to-Audio/Music Generation
archinetai/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
archinetai/audio-ai-timeline
A timeline of the latest AI models for audio generation, starting in 2023!
rsxdalv/tts-generation-webui
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
declare-lab/tango
A family of diffusion models for text-to-audio generation.
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
Yuan-ManX/ai-audio-datasets
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
researchmm/MM-Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
modelscope/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
metame-ai/awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
v-iashin/SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Yuan-ManX/audio-development-tools
This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.
cabralpinto/modular-diffusion
Python library for designing and training your own Diffusion Models with PyTorch.
FunAudioLLM/InspireMusic
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
sony/bigvsan
Pytorch implementation of BigVSAN
galgreshler/Catch-A-Waveform
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
soham97/awesome-sound_event_detection
Reading list for research topics in Sound AI
happylittlecat2333/Auffusion
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
archinetai/audio-data-pytorch
A collection of useful audio datasets and transforms for PyTorch.
archinetai/audio-diffusion-pytorch-trainer
Trainer for audio-diffusion-pytorch
ilaria-manco/word2wave
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
leopiney/neuralnoise
The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a Podcast Studio 🎙️📜
RoySheffer/im2wav
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
sony/soundctm
Pytorch implementation of SoundCTM
rsxdalv/bark-speaker-directory
Site for sharing Bark voices
olaviinha/NeuralTextToAudio
Text prompt steered synthetic audio generators
rsxdalv/musicgen-prompts
Site for sharing MusicGen + AudioGen Prompts and Creations
Yuanshi9815/LiteFocus
[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.
Bai-YT/ConsistencyTTA
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
soham97/sound_ai_progress
Tracking states of the arts and recent results (bibliography) on sound tasks.
0417keito/JEN-1-COMPOSER-pytorch
Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.19180)