/KAMA_AC

Primary LanguagePython

The code of master's thesis

This is the code of master's thesis which contains my previous two work (MAAC and FeatureCut).

Setting up the Code and Environment

  1. Clone this repository: https://github.com/Vancause/KAMA_AC.git
  2. Install pytorch >=1.8.0
  3. Use pip to install dependencies: pip install -r requirements.txt

Preparing the data

Clotho

  • Download the Clotho dataset for DCASE2021 Automated Audio Captioning challenge. And how to prepare training data and setup coco caption, please refer to Dcase2020 BUPT team's
  • Enter the audio_tag directory.
  • Firstly, run python generate_word_list.py to create words list word_list_pretrain_rules.p and tagging words to indexes of embedding layer TaggingtoEmbs.
  • Then run python generate_tag.py to generate audioTagName_{development/validation/evaluation}_fin_nv.pickle and audioTagNum_{development/validation/evaluation}_fin_nv.pickle

AudioCaps

Configuration

The training configuration is saved in the hparams.py and you can reset it to your own parameters.

Train the KAMA-AC model.

  • Run python run_newtransformer.py to train the KAMA-AC-T model.
  • Run python run_lstm.py to train the KAMA-AC-L model.
  • In the files run_lstm.py or run_newtransformer.py, you can modify hyper-parameters directly to run the ablations.

Train the KAMA-AC model with FeatureCut.

  • Run python run_featurecut.py to train the KAMA-AC-L model.
  • In the files run_lstm.py or run_newtransformer.py, you can modify hyper-parameters directly to run the ablations