Pinned Repositories
AudioCLIP
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
ESResNet
Source code for models described in the paper "ESResNet: Environmental Sound Classification Based on Visual Domain Models" (https://arxiv.org/abs/2004.07301)
ESResNeXt-fbsp
Source code for models described in the paper "ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio" (https://arxiv.org/abs/2104.11587)
FPNet
A signal segmentation method of CNN for audio event classification
TFNet-for-Environmental-Sound-Classification
Learning discriminative and robust time-frequency representations for environmental sound classification: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, attention mechanisms have been used in CNN to capture the useful information from the audio signal for sound classification, especially for weakly labelled data where the timing information about the acoustic events is not available in the training data, apart from the availability of sound class labels. In these methods, however, the inherent time-frequency characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a new method, called time-frequency enhancement block (TFBlock), which temporal attention and frequency attention are employed to enhance the features from relevant frames and frequency bands. Compared with other attention mechanisms, in our method, parallel branches are constructed which allow the temporal and frequency features to be attended respectively in order to mitigate interference from the sections where no sound events happened in the acoustic environments. The experiments on three benchmark ESC datasets show that our method improves the classification performance and also exhibits robustness to noise.
AndreyGuzhov's Repositories
AndreyGuzhov/AudioCLIP
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
AndreyGuzhov/ESResNeXt-fbsp
Source code for models described in the paper "ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio" (https://arxiv.org/abs/2104.11587)
AndreyGuzhov/ESResNet
Source code for models described in the paper "ESResNet: Environmental Sound Classification Based on Visual Domain Models" (https://arxiv.org/abs/2004.07301)
AndreyGuzhov/FPNet
A signal segmentation method of CNN for audio event classification
AndreyGuzhov/TFNet-for-Environmental-Sound-Classification
Learning discriminative and robust time-frequency representations for environmental sound classification: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, attention mechanisms have been used in CNN to capture the useful information from the audio signal for sound classification, especially for weakly labelled data where the timing information about the acoustic events is not available in the training data, apart from the availability of sound class labels. In these methods, however, the inherent time-frequency characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a new method, called time-frequency enhancement block (TFBlock), which temporal attention and frequency attention are employed to enhance the features from relevant frames and frequency bands. Compared with other attention mechanisms, in our method, parallel branches are constructed which allow the temporal and frequency features to be attended respectively in order to mitigate interference from the sections where no sound events happened in the acoustic environments. The experiments on three benchmark ESC datasets show that our method improves the classification performance and also exhibits robustness to noise.