/SoundNet_Pytorch

converting the pretrained tensorflow SoundNet model to pytorch

Primary LanguagePythonMIT LicenseMIT

SoundNet_Pytorch

converting the pretrained tensorflow SoundNet model to pytorch

from soundnet

Introduction

The code is for converting the pretrained tensorflow soundnet model to pytorch model. So no training code for SoundNet model. The pretrained pytorch soundnet model can be found here.

Prerequisites

  1. tensorflow (cpu or gpu)
  2. python 3.6 with numpy
  3. pytorch 0.4+
  4. weight file: google drive: https://drive.google.com/drive/folders/1zjNiuLgZ1cjCzF80P4mlYe4KSGGOFlta?usp=sharing; 百度网盘:链接:https://pan.baidu.com/s/1v_K2pJvo0KE38EZ__WZJWg 提取码:iz4h

How to use

  1. prepare the code
git clone https://github.com/smallflyingpig/SoundNet_Pytorch.git
cd SoundNet_Pytorch
  1. prepare the tensorflow soundnet model parameters. Download from sound8.npy, which is provided by eborboihuc, and save in the current folder.
  2. install the prerequisites
  3. run
python tf2pytorch.py --tf_param_path ./sound8.npy --pytorch_param_path ./sound8.pth
  1. test the result

download input demo data from demo.py and save to the current folder. We calculate the average feature errors at each convolution block (total 7 conv blocks) and the predictions for object/scene classification (2 layers), and output 9 error totally.

python check_layer.py --tf_param_path ./sound8.npy --pytorch_param_path ./sound8.pth --input_demo_data ./demo.npy

The expected output:

layer error:
[-1.3113022e-06, 0.0, 0.0, 0.0, 1.4901161e-08, 0.0, -6.9849193e-10, 4.7683716e-07, 7.1525574e-07]

This indicates the success of our model conversion.

  1. extract features after the pytorch model is got(save as ./sound8.pth), run the following command to extract features:
python example.py

Acknowledgments

Code for soundnet tensorflow model is ported from soundnet_tensorflow. Thanks for his works!

FAQs

Feel free to mail me(jiguo.li@vipl.ict.ac.cn or jgli@pku.edu.cn) if you have any questions about this project.

reference

  1. Yusuf Aytar, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." Advances in Neural Information Processing Systems. 2016.