Pzhang266
Universal Audio Processing (denoise, source separation, dereverbration ...)
Institute of Automation Chinese Academy of Sciences (CASIA)China Beijing
Pinned Repositories
acoustic-scene-analysis-with-multihead-self-attention
This repo contains implementation of the paper "Acoustic Scene Analysis With Multihead Self Attention" by Weimin Wang, Weiran Wang, Ming Sun, Chao Wang from Amazon Alexa team
AEC-Challenge
AEC Challenge
audiosetdl
Scripts for download AudioSet
av-se
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Awesome-Speech-Enhancement
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
coder2gwy
互联网首份程序员考公指南,由3位已经进入体制内的前大厂程序员联合献上。
DeepComplexCRN
Optical-Flow-Guided-Feature
Implementation Code of the paper Optical Flow Guided Feature, CVPR 2018
Pzhang266's Repositories
Pzhang266/avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
Pzhang266/Optical-Flow-Guided-Feature
Implementation Code of the paper Optical Flow Guided Feature, CVPR 2018
Pzhang266/acoustic-scene-analysis-with-multihead-self-attention
This repo contains implementation of the paper "Acoustic Scene Analysis With Multihead Self Attention" by Weimin Wang, Weiran Wang, Ming Sun, Chao Wang from Amazon Alexa team
Pzhang266/audiosetdl
Scripts for download AudioSet
Pzhang266/av-se
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Pzhang266/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Pzhang266/Independent_Component_Analysis
From scratch Python implementation of the fast ICA algorithm.
Pzhang266/lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
Pzhang266/MTAdam
MTAdam: Automatic Balancing of Multiple Training Loss Terms
Pzhang266/pyflow
Fast, accurate and easy to run dense optical flow with python wrapper
Pzhang266/TCDTIMITprocessing
processing and extracting of face and mouth image files out of the TCDTIMIT database
Pzhang266/WASE
PyTorch implementation of WASE described in our paper "Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Environments"