Honee-W

Master @ Northwestern Polytechnical University

Honee-W's Stars

openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Language:Jupyter Notebook26.5k 325 4033.4k
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python21.1k 210 3922.2k
rlabbe/Kalman-and-Bayesian-Filters-in-Python
Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.
Language:Jupyter Notebook16.9k 463 3284.2k
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
Language:Jupyter Notebook13.9k 98 181.1k
diff-usion/Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
Language:HTML11.2k 268 48951
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
Language:Python8.4k 98 91777
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Language:Python8.1k 49 01.1k
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Jupyter Notebook7.9k 76 223597
xiph/rnnoise
Recurrent neural network for audio noise reduction
Language:C4.2k 149 206909
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Language:Python3.3k 60 105338
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python3.2k 98 118286
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Language:Python2.7k 30 133221
facebookresearch/audio2photoreal
Code and dataset for photorealistic Codec Avatars driven from audio
Language:Python2.7k 31 62262
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Language:Python2.3k 19 83186
resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
Language:Python1.5k 18 52163
microsoft/NeuralSpeech
Language:Python1.4k 33 126182
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language:Python906 21 5549
microsoft/CLAP
Learning audio concepts from natural language supervision
Language:Python501 14 2338
nachifur/RDDM
CVPR 2024: Residual Denoising Diffusion Models
Language:Python409 2 4139
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Language:Python393 15 5338
ewan-xu/pyaec
simple and efficient python implemention of a series of adaptive filters. including time domain adaptive filters(lms、nlms、rls、ap、kalman)、nonlinear adaptive filters(volterra filter、functional link adaptive filters)、frequency domain adaptive filters(frequency domain adaptive filter、frequency domain kalman filter) for acoustic echo cancellation.
Language:Python334 5 498
sp-uhh/storm
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
Language:Python187 11 2225
chenzhuo1011/libri_css
Libri-CSS: dataset and evaluation pipeline
Language:Python139 8 723
iSEE-Laboratory/DiffUIR
The official implementation of the paper of CVPR2024: Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
Language:Python111 1 238
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
Language:Python108 2 36
Sreyan88/GAMA
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Language:Python89 7 209
line/open-universe
Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.
Language:Python75 5 29
RoyChao19477/PCS
Perceptual Contrast Stretching on Target Feature for Speech Enhancement (Accepted by INTERSPEECH 2022)
Language:MATLAB56 2 27
RicherMans/SAT
Streaming Audiotransformers for online Audio tagging
Language:Python41 4 44
frankenliu/LOAE
Language:Python10 3 11

Honee-W

Honee-W's Stars

openai/CLIP

facebookresearch/audiocraft

rlabbe/Kalman-and-Bayesian-Filters-in-Python

naklecha/llama3-from-scratch

diff-usion/Awesome-Diffusion-Models

facebookresearch/ImageBind

fishaudio/Bert-VITS2

open-mmlab/Amphion

xiph/rnnoise

NExT-GPT/NExT-GPT

gpt-omni/mini-omni

lucidrains/vector-quantize-pytorch

facebookresearch/audio2photoreal

eric-mitchell/direct-preference-optimization

resemble-ai/resemble-enhance

microsoft/NeuralSpeech

jishengpeng/WavTokenizer

microsoft/CLAP

nachifur/RDDM

YuanGongND/ltu

ewan-xu/pyaec

sp-uhh/storm

chenzhuo1011/libri_css

iSEE-Laboratory/DiffUIR

Labbeti/aac-datasets

Sreyan88/GAMA

line/open-universe

RoyChao19477/PCS

RicherMans/SAT

frankenliu/LOAE