audio-generation

There are 82 repositories under audio-generation topic.

  • LocalAI

    mudler/LocalAI

    :robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference

    Language:Go27.2k1919032k
  • FunAudioLLM/CosyVoice

    Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

    Language:Python8.5k77599823
  • Amphion

    open-mmlab/Amphion

    Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

    Language:Jupyter Notebook7.9k76225598
  • haoheliu/AudioLDM

    AudioLDM: Generate speech, sound effects, music and beyond, with text.

    Language:Python2.5k42110226
  • haoheliu/AudioLDM2

    Text-to-Audio/Music Generation

    Language:Python2.3k4571182
  • archinetai/audio-diffusion-pytorch

    Audio generation using diffusion models, in PyTorch.

    Language:Python2k4043168
  • archinetai/audio-ai-timeline

    A timeline of the latest AI models for audio generation, starting in 2023!

  • tts-generation-webui

    rsxdalv/tts-generation-webui

    TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)

    Language:TypeScript1.9k35248205
  • lucidrains/soundstorm-pytorch

    Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

    Language:Python1.4k512290
  • tango

    declare-lab/tango

    A family of diffusion models for text-to-audio generation.

    Language:Python1.1k284993
  • NVIDIA/BigVGAN

    Official PyTorch implementation of BigVGAN (ICLR 2023)

    Language:Python921700111
  • Yuan-ManX/ai-audio-datasets

    AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

  • researchmm/MM-Diffusion

    [CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

    Language:Python40362222
  • modelscope/FunCodec

    FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

    Language:Python375155231
  • metame-ai/awesome-audio-plaza

    Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

  • v-iashin/SpecVQGAN

    Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

    Language:Jupyter Notebook35383539
  • Yuan-ManX/audio-development-tools

    This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.

  • cabralpinto/modular-diffusion

    Python library for designing and training your own Diffusion Models with PyTorch.

    Language:Python26781412
  • FunAudioLLM/InspireMusic

    InspireMusic: A Unified Framework for Music, Song, Audio Generation.

    Language:Python26321
  • sony/bigvsan

    Pytorch implementation of BigVSAN

    Language:Python20129616
  • galgreshler/Catch-A-Waveform

    Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)

    Language:Python1894735
  • soham97/awesome-sound_event_detection

    Reading list for research topics in Sound AI

  • happylittlecat2333/Auffusion

    Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

    Language:Jupyter Notebook16091113
  • archinetai/audio-data-pytorch

    A collection of useful audio datasets and transforms for PyTorch.

    Language:Python1375522
  • archinetai/audio-diffusion-pytorch-trainer

    Trainer for audio-diffusion-pytorch

    Language:Python12861523
  • ilaria-manco/word2wave

    Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

    Language:Python1193315
  • leopiney/neuralnoise

    The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a Podcast Studio 🎙️📜

    Language:Python1127
  • RoySheffer/im2wav

    Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation

    Language:Python11031310
  • sony/soundctm

    Pytorch implementation of SoundCTM

    Language:Python75326
  • bark-speaker-directory

    rsxdalv/bark-speaker-directory

    Site for sharing Bark voices

    Language:TypeScript48420
  • olaviinha/NeuralTextToAudio

    Text prompt steered synthetic audio generators

    Language:Jupyter Notebook45337
  • musicgen-prompts

    rsxdalv/musicgen-prompts

    Site for sharing MusicGen + AudioGen Prompts and Creations

    Language:TypeScript40325
  • Yuanshi9815/LiteFocus

    [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.

    Language:Python33310
  • Bai-YT/ConsistencyTTA

    ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

    Language:Python32120
  • soham97/sound_ai_progress

    Tracking states of the arts and recent results (bibliography) on sound tasks.

  • 0417keito/JEN-1-COMPOSER-pytorch

    Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.19180)

    Language:Python29312