Final project - automated audio captioning
- Python >= 3.5 (3.6 recommended)
- PyTorch >= 0.4 (1.2 recommended)
- tqdm (Optional for
test.py
) - tensorboard >= 1.14 (see Tensorboard Visualization)
rl_aac/
│
├── train.py - main script to start training
├── test.py - evaluation of trained model
│
├── config.json - holds configuration for training
├── parse_config.py - class to handle config file and cli options
│
├── base/ - abstract base classes
│ ├── base_data_loader.py
│ ├── base_model.py
│ └── base_trainer.py
│
├── data_loader/ - anything about data loading goes here
│ └── data_loaders.py
│
├── data/ - default directory for storing input data
│
├── model/ - models
│ ├── model.py
│ ├── metric.py
│ └──
├── metric/ - metrics
│ ├── metric.py
│ ├── eval_metrics.py
│
├── saved/
│ ├── models/ - trained models are saved here
│ └── log/ - default logdir for tensorboard and logging output
│
├── trainer/ - trainers
│ ├── trainer.py
│ └── loss.py
│
├── logger/ - module for tensorboard visualization and logging
│ ├── visualization.py
│ ├── logger.py
│ └── logger_config.json
│
└── utils/ - small utility functions
├── util.py
└── ...
Preprocess
python clotho2pann.py --cfg data_setting.yaml
Train
python train.py -c policy.json
Config files are in .json
format:
{
"name": "Mnist_LeNet", // training session name
"n_gpu": 1, // number of GPUs to use for training.
"arch": {
"type": "MnistModel", // name of model architecture to train
"args": {
}
},
"data_loader": {
"type": "MnistDataLoader", // selecting data loader
"args":{
"data_dir": "data/", // dataset path
"batch_size": 64, // batch size
"shuffle": true, // shuffle training data before splitting
"validation_split": 0.1 // size of validation dataset. float(portion) or int(number of samples)
"num_workers": 2, // number of cpu processes to be used for data loading
}
},
"optimizer": {
"type": "Adam",
"args":{
"lr": 0.001, // learning rate
"weight_decay": 0, // (optional) weight decay
"amsgrad": true
}
},
"loss": "nll_loss", // loss
"metrics": [
"accuracy", "top_k_acc" // list of metrics to evaluate
],
"lr_scheduler": {
"type": "StepLR", // learning rate scheduler
"args":{
"step_size": 50,
"gamma": 0.1
}
},
"trainer": {
"epochs": 100, // number of training epochs
"save_dir": "saved/", // checkpoints are saved in save_dir/models/name
"save_freq": 1, // save checkpoints every save_freq epochs
"verbosity": 2, // 0: quiet, 1: per epoch, 2: full
"monitor": "min val_loss" // mode and metric for model performance monitoring. set 'off' to disable.
"early_stop": 10 // number of epochs to wait before early stop. set 0 to disable.
"tensorboard": true, // enable tensorboard visualization
}
}
- Fix algorithm for appropriate convergence
- Add Disciminator for CGAN system
This project is licensed under the MIT License. See LICENSE for more details