Source Code for the C3PO paper published at AACL-IJCNLP 2022 (Student Research Workshop) [Paper]
- Copy Phase
- Generate Phase
- Combine Phase
- Install conda or miniconda
- Create a conda environment
conda create -f c3po.yml
conda activate c3po
- Download the SPOC dataset as a zip and unzip into into the
data/
folder
Data preprocessing of the SPOC dataset to mask [CPY]
tags using a decision tree is included under cpy_preprocess/
.
The preprocessed datasets are released in the data/
folder.
There are 2 versions of both the train
and eval
datasets.
- Normally masking tokens with
[CPY]
tokens. Indata/CPY_dataset.pkl
(for train set) anddata/CPY_dataset_eval.pkl
(for eval set) - Numbering the
[CPY]
tokens as[CPY_1]
...[CPY_n]
. Indata/CPY_dataset_numbered.pkl
(for train set) anddata/CPY_dataset_numbered_eval.pkl
(for eval set)
Experiments were carried out with the 2 following architectures, which is included in the models/
folder.
- Vanilla Seq2Seq (
models/vanilla_seq2seq.py
) - Attention Seq2Seq (
models/attention_seq2seq.py
)
Scripts in train_scripts/seq2seq_training.py
.
Can train both
Usage:
python seq2seq_training.py
--attention # Boolean flag if attention seq2seq model to be trained (Default Vanilla Seq2Seq)
--non-copy # Boolean flag if non-copy version to be trained (no masking of tokens) (Default copy version)
The hyperparameters used have been set by default for each model
Inference scripts are included in inference/scripts/
. The predictions are released as .pkl
files under inference/predictions/
for the following versions
- Attention Seq2Seq with C3PO (CPY masking)
- Attention Seq2Seq w/o C3PO (No CPY masking)
- Vanilla Seq2Seq with C3PO (CPY masking)
- Vanilla Seq2Seq w/o C3PO (No CPY masking)
The results (BLEU scores) are computed using these prediction files in inference/results.ipynb
If you used this work in your research, please cite:
@inproceedings{veerendranath2022c3po,
title={C3PO: A Lightweight Copying Mechanism for Translating Pseudocode to Code},
author={Veerendranath, Vishruth and Masti, Vibha and Anagani, Prajwal and Hr, Mamatha},
booktitle={Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop},
pages={47--53},
year={2022}
}