audio-processing

There are 3270 repositories under audio-processing topic.

SincNet
SincNet is a neural architecture for efficiently processing raw audio samples.
Language:Python1.2k
StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python1.1k
awesome-audio-dsp
My curated list of audio DSP and plugin development resources
1.1k
audino
Open source audio annotation tool for humans
Language:JavaScript1.1k
chromaprint
C library for generating audio fingerprints used by AcoustID
Language:C++1.1k
nnAudio
Audio processing by using pytorch 1D convolution network
Language:Python1.1k
DawDreamer
Digital Audio Workstation with Python; VST instruments/effects, parameter automation, FAUST, JAX, Warp Markers, and JUCE processors
Language:C++1.1k
soundfingerprinting
Open source audio fingerprinting in .NET. An efficient algorithm for acoustic fingerprinting written purely in C#.
Language:C#1k
Wave-U-Net
Implementation of the Wave-U-Net for audio source separation
Language:Python909
SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
Language:Python890
audio-visualizer-android
🎵 [Android Library] A light-weight and easy-to-use Audio Visualizer for Android.
Language:Java883
klio
Smarter data pipelines for audio.
Language:Python856
Beethoven
:guitar: A maestro of pitch detection.
Language:Swift850
XR3Player
🎧 🎼 The MOST ADVANCED JavaFX Media Player
Language:Java752
APT
AI Productivity Tool - Free and open source, improve user productivity, and protect privacy and data security. Including but not limited to: built-in local exclusive ChatGPT, DeepSeek, Phi, Qwen and other models, one-click batch intelligent processing of pictures, videos, audio, etc.
Language:C#735
Awesome-Audio-LLM
Audio Large Language Models
Language:Python719
awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
692
DTLN
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
Language:Python645
r8brain-free-src
High-quality pro audio resampler / sample rate conversion C++ library. Very fast, for both audio resampling and time-series interpolation.
Language:C++637
fast-music-remover
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
Language:C++627
FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝
Language:Python626
PESQ
PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users (narrow band and wide band)
Language:C604
unsilence
Console Interface and Library to remove silent parts of a media file 🔈
Language:Python584
vectorhub
Vector Hub - Library for easy discovery, and consumption of State-of-the-art models to turn data into vectors. (text2vec, image2vec, video2vec, graph2vec, bert, inception, etc)
561
nara_wpe
Different implementations of "Weighted Prediction Error" for speech dereverberation
Language:Python532
Dplug
Make VST2 / VST3 / AU / AAX / CLAP / LV2 / FLP plug-ins for Linux/macOS/Windows, using D.
Language:D524
MediaEditor
A non-linear editing software that helps you to make nice video.
Language:C++477
SamplerBox
SamplerBox is a sampler musical instrument based on RaspberryPi.
Language:Python461
ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Language:Python453
musig
A shazam like tool to store songs fingerprints and retrieve them
Language:Go442
surfboard
Novoic's audio feature extraction library
Language:Python437
jumpcutter
⏩ Fast-forwards long pauses between sentences — watch lectures ~1.5x faster (browser extension)
Language:TypeScript425
emotion-classification-from-audio-files
Understanding emotions from audio files using neural networks and multiple datasets.
Language:Python419
android-vad
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
Language:C408
scaper
A library for soundscape synthesis and augmentation
Language:Python407
whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
Language:Python404