/ls-hw1

HW1: 11-775 Large-Scale Multimedia Analysis, Spring 2022

Primary LanguageJupyter Notebook

Instructions for HW1 (Youngin Kim, youngin2)

HW1: 11-775 Large-Scale Multimedia Analysis, Spring 2022

0. Prerquisite

Please modify config.sh for your own path

  • BASE_DIR : Base directory path that your codes are saved.
  • DATA_DIR : Data directory path that your data are saved.

Dependencies: FFMPEG, Python: sklearn, pandas

# [OPTIONAL] create conda environment
$ conda create -n myenv python=3.8
$ conda activate myenv

# install FFMPEG
$ apt-get install ffmpeg

# install pytorch according to instructions
# https://pytorch.org/get-started/
# ex) conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

# install requirements
$ pip install -r requirements.txt

Dependencies: OpenSMILE

$ bash sh/install_opensmile.sh

1. Data and Labels

  1. Download video data: same as baseline code TA provided.
  1. Uncompress the data into the folder you use.
  2. split data to K-Fold
$ bash sh/split.sh
  1. Extract the audios(.wav & .mp3) from the videos
  • .wav: for MFCC-Bag-Of-Feature
  • .mp3: for SoundNet-Global-Pool (sampling rate: 22050)
$ bash sh/extract_audio.sh

2. Feature Extractor

  1. MFCC-Bag-Of-Feature
$ bash sh/mfcc.sh
  1. SoundNet-Global-Pool
$ bash sh/soundnet.sh

3. Classifier results

  1. mfcc.csv : MFCC-Bag-Of-Feature + SVM classifier
$ bash sh/mfcc_svm.sh
  1. soundnet.csv : SoundNet(y_scns) + MLP classifier
$ bash sh/soundnet_mlp.sh
  1. best.csv : AST(Audio Spectrogram Transformer) + Linear head
$ bash sh/best.sh