
Project Submission for the class "Reinforcement Learning"

Primary LanguagePythonMIT LicenseMIT

Reinforcement Learning Project

Final project - automated audio captioning


  • Python >= 3.5 (3.6 recommended)
  • PyTorch >= 0.4 (1.2 recommended)
  • tqdm (Optional for test.py)
  • tensorboard >= 1.14 (see Tensorboard Visualization)

Folder Structure

├── train.py - main script to start training
├── test.py - evaluation of trained model
├── config.json - holds configuration for training
├── parse_config.py - class to handle config file and cli options
├── base/ - abstract base classes
│   ├── base_data_loader.py
│   ├── base_model.py
│   └── base_trainer.py
├── data_loader/ - anything about data loading goes here
│   └── data_loaders.py
├── data/ - default directory for storing input data
├── model/ - models
│   ├── model.py
│   ├── metric.py
│   └── 
├── metric/ - metrics
│   ├── metric.py
│   ├── eval_metrics.py
├── saved/
│   ├── models/ - trained models are saved here
│   └── log/ - default logdir for tensorboard and logging output
├── trainer/ - trainers
│   ├── trainer.py
│   └── loss.py
├── logger/ - module for tensorboard visualization and logging
│   ├── visualization.py
│   ├── logger.py
│   └── logger_config.json
└── utils/ - small utility functions
    ├── util.py
    └── ...


Preprocess python clotho2pann.py --cfg data_setting.yaml

Train python train.py -c policy.json

Config file format

Config files are in .json format:

  "name": "Mnist_LeNet",        // training session name
  "n_gpu": 1,                   // number of GPUs to use for training.
  "arch": {
    "type": "MnistModel",       // name of model architecture to train
    "args": {

  "data_loader": {
    "type": "MnistDataLoader",         // selecting data loader
      "data_dir": "data/",             // dataset path
      "batch_size": 64,                // batch size
      "shuffle": true,                 // shuffle training data before splitting
      "validation_split": 0.1          // size of validation dataset. float(portion) or int(number of samples)
      "num_workers": 2,                // number of cpu processes to be used for data loading
  "optimizer": {
    "type": "Adam",
      "lr": 0.001,                     // learning rate
      "weight_decay": 0,               // (optional) weight decay
      "amsgrad": true
  "loss": "nll_loss",                  // loss
  "metrics": [
    "accuracy", "top_k_acc"            // list of metrics to evaluate
  "lr_scheduler": {
    "type": "StepLR",                  // learning rate scheduler
      "step_size": 50,          
      "gamma": 0.1
  "trainer": {
    "epochs": 100,                     // number of training epochs
    "save_dir": "saved/",              // checkpoints are saved in save_dir/models/name
    "save_freq": 1,                    // save checkpoints every save_freq epochs
    "verbosity": 2,                    // 0: quiet, 1: per epoch, 2: full
    "monitor": "min val_loss"          // mode and metric for model performance monitoring. set 'off' to disable.
    "early_stop": 10	                 // number of epochs to wait before early stop. set 0 to disable.
    "tensorboard": true,               // enable tensorboard visualization


  • Fix algorithm for appropriate convergence
  • Add Disciminator for CGAN system


This project is licensed under the MIT License. See LICENSE for more details