AndreyGuzhov

Pinned Repositories

AudioCLIP
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
Language:Python754 17 092
ESResNet
Source code for models described in the paper "ESResNet: Environmental Sound Classification Based on Visual Domain Models" (https://arxiv.org/abs/2004.07301)
Language:Python31 4 011
ESResNeXt-fbsp
Source code for models described in the paper "ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio" (https://arxiv.org/abs/2104.11587)
Language:Python44 2 03
FPNet
A signal segmentation method of CNN for audio event classification
Language:Python0 1 00
TFNet-for-Environmental-Sound-Classification
Learning discriminative and robust time-frequency representations for environmental sound classification: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, attention mechanisms have been used in CNN to capture the useful information from the audio signal for sound classification, especially for weakly labelled data where the timing information about the acoustic events is not available in the training data, apart from the availability of sound class labels. In these methods, however, the inherent time-frequency characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a new method, called time-frequency enhancement block (TFBlock), which temporal attention and frequency attention are employed to enhance the features from relevant frames and frequency bands. Compared with other attention mechanisms, in our method, parallel branches are constructed which allow the temporal and frequency features to be attended respectively in order to mitigate interference from the sections where no sound events happened in the acoustic environments. The experiments on three benchmark ESC datasets show that our method improves the classification performance and also exhibits robustness to noise.
Language:Python00

AndreyGuzhov's Repositories

AndreyGuzhov/AudioCLIP
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
Language:Python754 17 092
AndreyGuzhov/ESResNeXt-fbsp
Source code for models described in the paper "ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio" (https://arxiv.org/abs/2104.11587)
Language:Python44 2 03
AndreyGuzhov/ESResNet
Source code for models described in the paper "ESResNet: Environmental Sound Classification Based on Visual Domain Models" (https://arxiv.org/abs/2004.07301)
Language:Python31 4 011
AndreyGuzhov/FPNet
A signal segmentation method of CNN for audio event classification
Language:Python0 1 00
AndreyGuzhov/TFNet-for-Environmental-Sound-Classification
Learning discriminative and robust time-frequency representations for environmental sound classification: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, attention mechanisms have been used in CNN to capture the useful information from the audio signal for sound classification, especially for weakly labelled data where the timing information about the acoustic events is not available in the training data, apart from the availability of sound class labels. In these methods, however, the inherent time-frequency characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a new method, called time-frequency enhancement block (TFBlock), which temporal attention and frequency attention are employed to enhance the features from relevant frames and frequency bands. Compared with other attention mechanisms, in our method, parallel branches are constructed which allow the temporal and frequency features to be attended respectively in order to mitigate interference from the sections where no sound events happened in the acoustic environments. The experiments on three benchmark ESC datasets show that our method improves the classification performance and also exhibits robustness to noise.
Language:Python00