Pzhang266
Universal Audio Processing (denoise, source separation, dereverbration ...)
Institute of Automation Chinese Academy of Sciences (CASIA)China Beijing
Pinned Repositories
acoustic-scene-analysis-with-multihead-self-attention
This repo contains implementation of the paper "Acoustic Scene Analysis With Multihead Self Attention" by Weimin Wang, Weiran Wang, Ming Sun, Chao Wang from Amazon Alexa team
AEC-Challenge
AEC Challenge
audiosetdl
Scripts for download AudioSet
av-se
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Awesome-Speech-Enhancement
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
coder2gwy
互联网首份程序员考公指南,由3位已经进入体制内的前大厂程序员联合献上。
DeepComplexCRN
Optical-Flow-Guided-Feature
Implementation Code of the paper Optical Flow Guided Feature, CVPR 2018
Pzhang266's Repositories
Pzhang266/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Pzhang266/AEC-Challenge
AEC Challenge
Pzhang266/coder2gwy
互联网首份程序员考公指南,由3位已经进入体制内的前大厂程序员联合献上。
Pzhang266/DeepXi
Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.
Pzhang266/dlib
A toolkit for making real world machine learning and data analysis applications in C++
Pzhang266/EMGFilters
Filter functions for processing EMG signals.
Pzhang266/fast_bss_eval
A fast implementation of bss_eval metrics for blind source separation
Pzhang266/FullSubNet
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Pzhang266/gpuRIR
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
Pzhang266/libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Pzhang266/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Pzhang266/ML-NLP
此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。
Pzhang266/MTFAA-Net
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement
Pzhang266/Neural-Speech-Dereverberation
Machine and Deep Learning models for speech dereverberation
Pzhang266/open_flamingo
An open-source framework for training large multimodal models.
Pzhang266/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Pzhang266/pedalboard
🎛 🔊 A Python library for adding effects to audio.
Pzhang266/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Pzhang266/PseudoBinaural_CVPR2021
Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)
Pzhang266/pyaec
simple and efficient python implemention of a series of adaptive filters. including time domain adaptive filters(lms、nlms、rls、ap、kalman)、nonlinear adaptive filters(volterra filter、functional link adaptive filters)、frequency domain adaptive filters(frequency domain adaptive filter、frequency domain kalman filter) for acoustic echo cancellation.
Pzhang266/pysepm
Python implementation of performance metrics in Loizou's Speech Enhancement book
Pzhang266/s3prl
Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit
Pzhang266/solo-learn
solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
Pzhang266/SoundSourceSeparation
The code for multi-channel source separation and dereverberation such as FastMNMF1, FastMNMF2, and AR-FastMNMF2.
Pzhang266/speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Pzhang266/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Pzhang266/UnsupSeg
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation (INTERSPEECH 2020)
Pzhang266/VisualVoice
Audio-Visual Speech Separation with Cross-Modal Consistency
Pzhang266/wesper-demo
Pzhang266/ZQCNN
一款比mini-caffe更快的Forward库,觉得好用请点星啊,400星公布快速人脸检测模型,500星公布106点landmark,600星公布人头检测模型,700星公布人脸检测套餐(六种pnet,两种rnet随意混合使用满足各种速度/精度要求),800星公布更准的106点模型