/labw_gandrl_nps

Primary LanguagePythonMIT LicenseMIT

Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis

This repository contains the code used for the paper Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis.

Requirements

We recommend installing this code into a virtual environment. In order to run the code, you first need to install pytorch, following the instructions from the pytorch website. Once this is done, you can install this package and its dependencies by running:

pip install cython
python setup.py install

The experiments in the original paper were run using the dataset found at the Karel dataset webpage. We recommend you download and extract them into the `./data** directory.

Commands

The code can be interacted with using two commands: train_cmd.py to perform training of a model and eval_cmd.py to perform testing. This section introduces the possible option, you can also use --help to see what is available.

Train

  • --kernel_size, --conv_stack, --fc_stack, --tgt_embedding_size, --lstm_hidden_size, --nb_lstm_layers are flags to specify the architecture of the model to learn. See nps/network.py to see how they are used. --nb_ios specifies how many of the IO pairs should be used as inputs to the encoder (note that due to the architecture, even a model trained with x IO can be used to do prediction, even if a different number of IOs is available at test time).
  • --use_grammar makes the model use the handwritten syntax checker, found in syntax/checker.pyx. --learn_syntax adds a Syntax LSTM to the model that attempts to learn a syntax checker, jointly with the rest of the model. The importance of this objective is controlled by the --beta parameter.
  • --signal allows to choose the loss, between supervised, rl and beam_rl. Supervised attempts to reproduce the ground truth program, while rl and beam_rl try to maximize expected rewards. What rewards are used is specified using the --environment argument (it can be Consistency to evaluate coherence of the programs with the observed IO grids, Generalization to also take into account the held out pair, or Perf to additionally include consideration about number of steps taken.) In the case where the beam search approximation is used, it is also possible to specify a Reward Combination Function using --reward_comb. The default one is RenormExpected but the "bag of samples" version can be used by choosing X1m1BagExpected for 1/-1 rewards or XBagExpected for the general case. In order to be able to fit experiments in a single GPU, you may need to adjust --nb_rollouts (how many samples are taken from the model to estimate a gradient when using rl) or --rl_beam (the size of the beam search when using beam_rl). There is also the --rl_inner_batch option that splits the computation of a batch into several minibatches that are separately evaluated before doing a gradient step.
  • --optim_alg chooses the optimization algorithm used, --batch_size allows to choose the size of the mini batches. --learning_rate adjusts the learning rate. --init_weights can be used to specify a '.model' file from which to load weights.
  • --train_file specify the json file where to look for the training samples and --val_file indicates a validation set. The validation set is used to keep track of the best model seen so far, so as to perform early stopping. The --vocab file is there to give a correspondence between tokens and indices in the learned predictions. Setting --nb_samples allows to train on only part of the dataset (0, the default, trains on the whole dataset.). --result_folder allows to indicate where the results of the experiment should be stored. Changing --val_frequency allows to evaluate accuracy on the validation set less frequently.
  • Specify --use_cuda to run everything on a GPU. You can use the CUDA_VISIBLE_DEVICES to run on a specific GPU.
# Train a simple supervised model, using the handcoded syntax checker
train_cmd.py --kernel_size 3 \
             --conv_stack "64,64,64" \
             --fc_stack "512" \
             --tgt_embedding_size 256 \
             --lstm_hidden_size 256 \
             --nb_lstm_layers 2 \
             \
             --signal supervised \
             --nb_ios 5 \
             --nb_epochs 100 \
             --optim_alg Adam \
             --batch_size 128 \
             --learning_rate 1e-4 \
             \
             --train_file data/1m_6ex_karel/train.json \
             --val_file data/1m_6ex_karel/val.json \
             --vocab data/1m_6ex_karel/new_vocab.vocab \
             --result_folder exps/supervised_use_grammar \
             \
             --use_grammar \
             \
             --use_cuda
             
# Train a supervised model, learning the grammar at the same time
train_cmd.py --kernel_size 3 \
             --conv_stack "64,64,64" \
             --fc_stack "512" \
             --tgt_embedding_size 256 \
             --lstm_hidden_size 256 \
             --nb_lstm_layers 2 \
             \
             --signal supervised \
             --nb_ios 5 \
             --nb_epochs 100 \
             --optim_alg Adam \
             --batch_size 128 \
             --learning_rate 1e-4 \
             --beta 1e-5 \
             \
             --train_file data/1m_6ex_karel/train.json \
             --val_file data/1m_6ex_karel/val.json \
             --vocab data/1m_6ex_karel/new_vocab.vocab \
             --result_folder exps/supervised_learn_grammar \
             \
             --learn_syntax \
             \
             --use_cuda
             
# Use a pretrained model, to fine-tune it using simple Reinforce
# Change the --environment flag if you want to use a reward including performance.
train_cmd.py  --signal rl \
              --environment BlackBoxGeneralization \
              --nb_rollouts 100 \
              \
              --init_weights exps/supervised_use_grammar/Weights/best.model \
              --nb_epochs 5 \
              --optim_alg Adam \
              --learning_rate 1e-5 \
              --batch_size 16 \
              \
              --train_file data/1m_6ex_karel/train.json \
              --val_file data/1m_6ex_karel/val.json \
              --vocab data/1m_6ex_karel/new_vocab.vocab \
              --result_folder exps/reinforce_finetune \
              \
              --use_grammar \
              \
              --use_cuda
              

# Use a pretrained model, fine-tune it using BS Expected reward
# Change the --environment flag if you want to use a reward including performance.
# Change the --reward_comb flag if you want to use one of the "bag of samples" loss
# Remove the --rl_use_ref flag if you don't want to make use of the known ground truth in 
# the bag.
train_cmd.py  --signal beam_rl \
              --environment BlackBoxGeneralization \
              --reward_comb RenormExpected \
              --rl_inner_batch 8 \
              --rl_use_ref \
              --rl_beam 64 \
              \
              --init_weights exps/supervised_use_grammar/Weights/best.model \
              --nb_epochs 5 \
              --optim_alg Adam \
              --learning_rate 1e-5 \
              --batch_size 16 \
              \
              --train_file data/1m_6ex_karel/train.json \
              --val_file data/1m_6ex_karel/val.json \
              --vocab data/1m_6ex_karel/new_vocab.vocab \
              --result_folder exps/beamrl_finetune \
              \
              --use_grammar \
              \
              --use_cuda
             

Evaluation

The evaluation command is fairly similar. Any flags non-specified has the same role as for the train_cmd.py command. The relevant file is nps/evaluate.py.

  • --model_weights should point to the model to evaluate.
  • --dataset should point to the json file containing the dataset you want to evaluate against.
  • --output_path points to where the results should be written. This should be a prefix for all the names of the files that will be generated, followed
  • --dump_programs can be used to investigate by dumping the programs returned by the model.
  • --eval_nb_ios is analogous to --nb_ios during training, how many IO pairs should be used as input to the model.
  • --val_nb_samples is analogous to --nb_samples, can be used to do evaluation on only part of the dataset.
  • --eval_batch_size specifies the batch size to use during decoding. This doesn't affect accuracies and batching operations only allows to go faster.
  • --beam_size controls the size of the beam search to run when decoding the programs and --top_k should be the largest integer for which the accuracies should be computed.

This will generate a set of files. If --dump_programs is passed, the --top_k most likely programs for each element of the dataset will be dumped, with their rank and their log-probability in the generated subfolder. This will also include the reference program, under the name target.

The values at various ranks are reported in the generated files. exactmatch corresponds to exactly reproducing the input, semantic corresponds to generating a program being correct on the observed IOs, fullgeneralize means generating a program correct on the observed AND held-out IOs. syntax simply indicates that the program was synctatically correct. If the file, semantic_top3.txt contains the number 75.00, this means that for 75.00% of the samples, one of the top 3 programs according to the model will be semantically correct on the observed samples.

# Evaluate a trained model on the validation set, dumping programs to allow for debugging.
eval_cmd.py --model_weights exps/supervised_use_grammar/Weights/best.model \
            \
            --vocabulary data/1m_6ex_karel/new_vocab.vocab \
            --dataset data/1m_6ex_karel/val.json \
            --eval_nb_ios 5 \
            --eval_batch_size 8 \
            --output_path exps/supervised_use_grammar/Results/ValidationSet_ \
            \
            --beam_size 64 \
            --top_k 10 \
            --dump_programs \
            --use_grammar \
            \
            --use_cuda

# Evaluate a trained model on the test set
eval_cmd.py --model_weights exps/beamrl_finetune/Weights/best.model \
            \
            --vocabulary data/1m_6ex_karel/new_vocab.vocab \
            --dataset data/1m_6ex_karel/test.json \
            --eval_nb_ios 5 \
            --eval_batch_size 8 \
            --output_path exps/beamrl_finetune/Results/TestSet_ \
            \
            --beam_size 64 \
            --top_k 10 \
            --use_grammar \
            \
            --use_cuda

Citation

If you use this code in your research, consider citing:

@Article{Bunel2018,
  author        = {Bunel, Rudy and Hausknecht, Matthew and Devlin, Jacob and Singh, Rishabh and Kohli, Pushmeet},
  title        =  {Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis},
  journal      = {ICLR},
  year         = {2018},
}