Honee-W's Stars
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
rlabbe/Kalman-and-Bayesian-Filters-in-Python
Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
diff-usion/Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
xiph/rnnoise
Recurrent neural network for audio noise reduction
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
facebookresearch/audio2photoreal
Code and dataset for photorealistic Codec Avatars driven from audio
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
microsoft/NeuralSpeech
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
microsoft/CLAP
Learning audio concepts from natural language supervision
nachifur/RDDM
CVPR 2024: Residual Denoising Diffusion Models
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
ewan-xu/pyaec
simple and efficient python implemention of a series of adaptive filters. including time domain adaptive filters(lms、nlms、rls、ap、kalman)、nonlinear adaptive filters(volterra filter、functional link adaptive filters)、frequency domain adaptive filters(frequency domain adaptive filter、frequency domain kalman filter) for acoustic echo cancellation.
sp-uhh/storm
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
chenzhuo1011/libri_css
Libri-CSS: dataset and evaluation pipeline
iSEE-Laboratory/DiffUIR
The official implementation of the paper of CVPR2024: Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
Sreyan88/GAMA
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
line/open-universe
Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.
RoyChao19477/PCS
Perceptual Contrast Stretching on Target Feature for Speech Enhancement (Accepted by INTERSPEECH 2022)
RicherMans/SAT
Streaming Audiotransformers for online Audio tagging
frankenliu/LOAE