audio-ai

There are 14 repositories under audio-ai topic.

  • awesome-large-audio-models

    EmulationAI/awesome-large-audio-models

    Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

  • narcotic-sh/senko

    Very fast, accurate speaker diarization

    Language:Python9310
  • zebbern/no-cost-ai

    A Collection of no cost ai websites with models such as Claude 4 sonnet/opus, Grok 4, o3 Pro, Gemini 2.5 Pro for free & much more...

  • kyegomez/AudioFlamingo

    Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"

    Language:Python40531
  • serp-ai/ai-text-to-audio-latent-diffusion

    text-to-audio-latent-diffusion

    Language:Python37618
  • ksasso1028/audio-reverb-removal

    Code to train a custom time-domain autoencoder to dereverb audio

    Language:Python16112
  • aaivu/KuralNet

    A deep learning-based Speech Emotion Recognition (SER) model trained primarily on Indian languages. Designed for applications in call centers, sentiment analysis, and accessibility tools.

    Language:Python7
  • domenicostefani/elk-audio-AI-tutorial

    Guide to deploying neural networks in VST plugins, with a specific focus on embedded devices using the Elk Audio OS

    Language:Jupyter Notebook6101
  • SoheilGtex/Voice-Cloning-SV2TTS-

    Safe, production-ready starter for voice cloning via SV2TTS (RTVC wrapper). CLI, tests, Docker, CI, pre-commit. No model weights included.

    Language:Python5
  • saoud30/Audio-AI

    🗣️ Audio AI: Your Audio & Video Transcription Powerhouse!

    Language:Python31
  • open-v2ai/podcast-ai

    Whether it’s text or a link, it can be turned into a podcast!

    Language:TypeScript1100
  • engasd999/senko

    ⚡ Accelerate speaker diarization with Senko, processing 1 hour of audio in just 5 seconds on powerful hardware—boost your audio analysis efficiency.

    Language:Python
  • hari7261/AgentPodcast-AI

    PodcastAgent uses advanced text-to-speech technology to create natural-sounding multi-speaker podcasts from any written content.

    Language:Python
  • SzymiczeQ/zanshin

    🎧 Navigate audio content effortlessly with Zanshin, a media player that enhances your listening experience by speaker, supporting both YouTube and local files.

    Language:Svelte