Pzhang266

Universal Audio Processing (denoise, source separation, dereverbration ...)

Institute of Automation Chinese Academy of Sciences (CASIA)China Beijing

Pinned Repositories

acoustic-scene-analysis-with-multihead-self-attention
This repo contains implementation of the paper "Acoustic Scene Analysis With Multihead Self Attention" by Weimin Wang, Weiran Wang, Ming Sun, Chao Wang from Amazon Alexa team
Language:Python0 0 00
AEC-Challenge
AEC Challenge
Language:Python0 0 00
audiosetdl
Scripts for download AudioSet
Language:Jupyter Notebook0 0 00
av-se
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
0 0 00
avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
Language:Python1 0 00
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
10
Awesome-Speech-Enhancement
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
Language:MATLAB0 0 01
coder2gwy
互联网首份程序员考公指南，由3位已经进入体制内的前大厂程序员联合献上。
0 0 00
DeepComplexCRN
Language:HTML0 0 00
Optical-Flow-Guided-Feature
Implementation Code of the paper Optical Flow Guided Feature, CVPR 2018
Language:C++1 0 00

Pzhang266's Repositories

Pzhang266/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
10
Pzhang266/AEC-Challenge
AEC Challenge
Language:Python0 0 00
Pzhang266/coder2gwy
互联网首份程序员考公指南，由3位已经进入体制内的前大厂程序员联合献上。
0 0 00
Pzhang266/DeepXi
Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.
Language:MATLAB0 0 00
Pzhang266/dlib
A toolkit for making real world machine learning and data analysis applications in C++
Language:C++0 0 00
Pzhang266/EMGFilters
Filter functions for processing EMG signals.
Language:C++0 0
Pzhang266/fast_bss_eval
A fast implementation of bss_eval metrics for blind source separation
Language:Python0 0
Pzhang266/FullSubNet
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Language:Python0 02
Pzhang266/gpuRIR
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
Language:Cuda0 0
Pzhang266/libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Language:C++0 0
Pzhang266/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python0 0
Pzhang266/ML-NLP
此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现，也是作为一个算法工程师必会的理论基础知识。
Language:Jupyter Notebook0 0
Pzhang266/MTFAA-Net
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement
Language:Python0 0
Pzhang266/Neural-Speech-Dereverberation
Machine and Deep Learning models for speech dereverberation
Language:Python0 0
Pzhang266/open_flamingo
An open-source framework for training large multimodal models.
Language:Python0 0
Pzhang266/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Language:Jupyter Notebook0 0
Pzhang266/pedalboard
🎛 🔊 A Python library for adding effects to audio.
Language:C++0 0
Pzhang266/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Language:MDX0 0
Pzhang266/PseudoBinaural_CVPR2021
Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)
Language:Python0 0
Pzhang266/pyaec
simple and efficient python implemention of a series of adaptive filters. including time domain adaptive filters(lms、nlms、rls、ap、kalman)、nonlinear adaptive filters(volterra filter、functional link adaptive filters)、frequency domain adaptive filters(frequency domain adaptive filter、frequency domain kalman filter) for acoustic echo cancellation.
Language:Python0 0
Pzhang266/pysepm
Python implementation of performance metrics in Loizou's Speech Enhancement book
Language:Python0 0
Pzhang266/s3prl
Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit
Language:Python0 0
Pzhang266/solo-learn
solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
Language:Python0 0
Pzhang266/SoundSourceSeparation
The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.
Language:Python0 0
Pzhang266/speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Language:Python0 0
Pzhang266/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Language:Python0 0
Pzhang266/UnsupSeg
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation (INTERSPEECH 2020)
Language:Python0 0
Pzhang266/VisualVoice
Audio-Visual Speech Separation with Cross-Modal Consistency
Language:Python0 0
Pzhang266/wesper-demo
Language:Python0 0
Pzhang266/ZQCNN
一款比mini-caffe更快的Forward库，觉得好用请点星啊，400星公布快速人脸检测模型，500星公布106点landmark，600星公布人头检测模型，700星公布人脸检测套餐（六种pnet,两种rnet随意混合使用满足各种速度/精度要求），800星公布更准的106点模型
Language:C0 0