Sound Classification

ROS package to classify sound stream.

Contents

Setup

  1. Install ROS. Available OS:

    • Ubuntu 16.04 (?)
    • Ubuntu 18.04
  2. Create workspace

    mkdir ~/sound_classification_ws/src -p
    cd ~/sound_classification_ws/src
    git clone https://github.com/708yamaguchi/sound_classification.git
    # Note: We should stop using `audio_to_spectrogram` from source after released.
    git clone https://github.com/708yamaguchi/jsk_recognition.git -b multi-channel
    rosdep install --from-paths . --ignore-src -y -r
    cd ..
    catkin build sound_classification
    source ~/sound_classification_ws/devel/setup.bash
  3. Install other packages.

Usage

  1. Write your microphone parameters to audio_to_spectrogram.launch's arg tags.

    • In particular, device, n_channel, bitdepth and sample_rate need to be specified.
    • The example bash commands to get these params are below:
      # For device. In this example, card 0 and device 0, so device:="hw:0,0"
      $ arecord -l
      \**** List of CAPTURE Hardware Devices ****
      card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
      Subdevices: 1/1
      Subdevice #0: subdevice #0
      # For n_channel, bitdepth and sample_rate,
      $ pactl list short sinks
      0       alsa_output.pci-0000_00_1f.3.analog-stereo      module-alsa-card.c      s16le 2ch 44100Hz   SUSPENDED
    • If you use /audio topic from other computer and do not want to publish /audio, set use_microphone:=false at each launch flie.
  2. Save environmental noise to train_data/noise.npy.

    • By subtracting noise, spectrograms become clear.
    • During this script, you must not give any sound to the sensor.
    • You should update noise data everytime before sound recognition, because environmental sound differs everytime.
    • 30 noise samples are enough.
      $ roslaunch sound_classification save_noise.launch
  3. Publish audio -> spectrum -> spectrogram topics.

    • You can set the max/min frequency to be included in the spectrum by high_cut_freq/low_cut_freq args in audio_to_spectrogram.launch.

    • If gui:=true, spectrum and spectrogram are visualized.

      $ roslaunch sound_classification audio_to_spectrogram.launch gui:=true
    • Here is an example spectrogram at quiet environment.

      • Horiozntal axis is time [Hz]
      • Vertical axis is frequency [Hz]
      Spectrogram w/o noise subtraction Spectrogram w/ noise subtraction
  4. Collect spectrogram you would like to classify at train_data/original_spectrogram/TARGET_CLASS.

    1. Rosbag version (Recommended)

      • I recommend to use rosbag to collect spectrograms. The rosbag makes it easy to use save_sound.launch with several parameters.
      • In target_class:=TARGET_CLASS, you can set the class name of your target sound.
      • By using use_rosbag:=true and filename:=PATH_TO_ROSBAG, you can save spectrograms from rosbag.
      • By default, rosbag is paused at first. Press 'Space' key on terminal to start playing rosbag. When rosbag ends, press 'Ctrl-c' to terminate.
      • The newly saved spectrograms are appended to existing spectrograms.
      • You can change threshold of sound saving by threshold:=xxx. The smaller the value is, the more easily sound is saved.
        # Save audio to rosbag
        $ roslaunch sound_classification record_audio_rosbag.launch filename:=PATH_TO_ROSBAG
        # play rosbag and collecting data
        $ export ROS_MASTER_URI=http://localhost:11311
        $ roslaunch sound_classification save_sound.launch use_rosbag:=true \
          filename:=PATH_TO_ROSBAG target_class:=TARGET_CLASS threshold:=0.5
      • By setting threshold:=0 and save_when_sound:=false, you can collect spectrogram of "no sound".
        # play rosbag and collecting no-sound data
        $ export ROS_MASTER_URI=http://localhost:11311
        $ roslaunch sound_classification save_sound.launch use_rosbag:=true \
          filename:=PATH_TO_ROSBAG target_class:=no_sound threshold:=0 save_when_sound:=false
    2. Stream version (Not Recommended)

      • You can collect spectrogram directly from audio topic stream.
      • Do not use use_rosbag:=true. The other args are the same as the rosbag version. Please see above.
        $ roslaunch sound_classification save_sound.launch \
        save_when_sound:=true target_class:=TARGET_CLASS threshold:=0.5 save_data_rate:=5
  5. Create dateaset for chainer from saved spectrograms.

    • Some data augmentation is executed.
    • --number 30 means to use maximum 30 images for each class in dataset.
      $ rosrun sound_classification create_dataset.py --number 30
  6. Visualize dataset.

    • You can use train arg for train dataset (augmented dataset), test arg for test dataset.
    • The spectrograms in the dataset are visualized in random order.
      $ rosrun sound_classification visualize_dataset.py test # train/test
  7. Train with dataset.

    • Default model is NIN (Recommended).
    • If you use vgg16, pretrained weights of VGG16 is downloaded to scripts/VGG_ILSVRC_16_layers.npz at the first time you run this script.
      $ rosrun sound_classification train.py --epoch 30
  8. Classify sounds.

    • It takes a few seconds for the neural network weights to be loaded.
    • use_rosbag:=true and filename:=PATH_TO_ROSBAG is available if you classify sound with rosbag.
      $ roslaunch sound_classification classify_sound.launch
    • You can fix class names' color in classification result image by specifying order of class names like below:
      <rosparam>
        target_names: [none, other, chip_bag]
      </rosparam>
    • Example classification result:
      no_sound applause voice

Quick demo

Sound classification demo with your laptop's built-in microphone. You can create dataset from rosbag files in sample_rosbag/ directory.

Classification example gif

demo

Commands

  1. Setup environment and write Microphone settings (1. of Usage section)

  2. Save environmental noise

    $ roslaunch sound_classification save_noise.launch
  3. Collect spectrograms from sample rosbags. Press 'Space' to start rosbag.

    • For no_sound class
      $ roslaunch sound_classification save_sound.launch use_rosbag:=true \
      filename:=$(rospack find sound_classification)/sample_rosbag/no_sound.bag \
      target_class:=no_sound threshold:=0 save_when_sound:=false
    • For applause class
      $ roslaunch sound_classification save_sound.launch use_rosbag:=true \
      filename:=$(rospack find sound_classification)/sample_rosbag/applause.bag \
      target_class:=applause threshold:=0.5
    • For voice class
      $ roslaunch sound_classification save_sound.launch use_rosbag:=true \
      filename:=$(rospack find sound_classification)/sample_rosbag/voice.bag \
      target_class:=voice threshold:=0.5
  4. Create dataset

    $ rosrun sound_classification create_dataset.py --number 30
  5. Train (takes ~10 minites)

    $ rosrun sound_classification train.py --epoch 20
  6. Classify sound

    $ roslaunch sound_classification classify_sound.launch