This is the code of master's thesis which contains my previous two work (MAAC and FeatureCut).
- Clone this repository:
https://github.com/Vancause/KAMA_AC.git
- Install pytorch >=1.8.0
- Use pip to install dependencies:
pip install -r requirements.txt
- Download the Clotho dataset for DCASE2021 Automated Audio Captioning challenge. And how to prepare training data and setup coco caption, please refer to Dcase2020 BUPT team's
- Enter the audio_tag directory.
- Firstly, run
python generate_word_list.py
to create words listword_list_pretrain_rules.p
and tagging words to indexes of embedding layerTaggingtoEmbs
. - Then run
python generate_tag.py
to generate audioTagName_{development/validation/evaluation}_fin_nv.pickle and audioTagNum_{development/validation/evaluation}_fin_nv.pickle
- Download the dataset from https://github.com/XinhaoMei/ACT.
- Generate the
.npy
files through runninggenerate_audiocaps_files.py
The training configuration is saved in the hparams.py
and you can reset it to your own parameters.
- Run
python run_newtransformer.py
to train the KAMA-AC-T model. - Run
python run_lstm.py
to train the KAMA-AC-L model. - In the files
run_lstm.py or run_newtransformer.py
, you can modify hyper-parameters directly to run the ablations.
- Run
python run_featurecut.py
to train the KAMA-AC-L model. - In the files
run_lstm.py or run_newtransformer.py
, you can modify hyper-parameters directly to run the ablations