Our code is modified based on transformer-pointer-generator. The original project only supports datasets in Chinese format, and can not directly process Datasets in English format.
when I wanted to get summary by neural network, I tried many ways to generate abstract summary, but the result was not good. when I heared 2018 byte cup, I found some information about it, and the champion's solution attracted me, but I found some websites, like github gitlab, I didn't find the official code, so I decided to implement it.
My model is based on Attention Is All You Need and Get To The Point: Summarization with Pointer-Generator Networks
- The pointer-generator model has two mechanisms, which are copy mechanism and coverage mechanism, I found some materials, they show the Coverage mechanism doesn't suit short summary, so I didn't use this mechanism, just use the first one.
- Pointer generator model has a inadequacy, which can let the loss got nan, I tried some times and wanted to fix it, but the result was I can't, I think the reason was when calculate final logists, it will extend vocab length to oov and vocab length, it will get more zeroes. so I delete the mechanism of extend final logists, just use their mechanism of deocode from article and vocab. there is more detail about it, in this model, I just use word than vocab, this idea is from bert.
- python==3.x (Let's move on to python 3 if you still use python 2)
- tensorflow==1.12.0
- tqdm>=4.28.1
- jieba>=0.3x
- sumeval>=0.2.0
- STEP 1. Create folder
dataset/your_dataset_name
and createtrain_source.txt
,train_target.txt
,eval_source.txt
,eval_target.txt
,test_source.txt
,test_target.txt
. Noted that each file contains the corresponding text line by line. - STEP 2. Run command
python merge_source_target.py
to get the processed files(train.csv
,eval.csv
,test.csv
andvocab
)
Run the following command.
python train.py
Check hparams.py
to see which parameters are possible. For example,
python train.py --logdir log/conala --evaldir eval/conala --train dataset/conala/train.csv --eval dataset/conala/eval.csv --vocab dataset/conala/vocab --vocab_size 4137 --maxlen1 200 --maxlen2 50 --batch_size 32
My code also improve multi gpu to train this model, if you have more than one gpu, just run like this
python train.py --logdir log/conala --evaldir eval/conala --train dataset/conala/train.csv --eval dataset/conala/eval.csv --vocab dataset/conala/vocab --vocab_size 4137 --maxlen1 200 --maxlen2 50 --batch_size 32 --gpu_nums=1
name | type | detail |
---|---|---|
vocab_size | int | vocab size |
train | str | train dataset dir |
eval | str | eval dataset dir |
test | str | data for calculate rouge score |
vocab | str | vocabulary file path |
batch_size | int | train batch size |
eval_batch_size | int | eval batch size |
lr | float | learning rate |
warmup_steps | int | warmup steps by learing rate |
logdir | str | log directory |
num_epochs | int | the number of train epoch |
evaldir | str | evaluation dir |
d_model | int | hidden dimension of encoder/decoder |
d_ff | int | hidden dimension of feedforward layer |
num_blocks | int | number of encoder/decoder blocks |
num_heads | int | number of attention heads |
maxlen1 | int | maximum length of a source sequence |
maxlen2 | int | maximum length of a target sequence |
dropout_rate | float | dropout rate |
beam_size | int | beam size for decode |
gpu_nums | int | gpu amount, which can allow how many gpu to train this model, default 1 |
Don't change the hyper-parameters of transformer util you have good solution, it will let the loss can't go down! if you have good solution, I hope you can tell me.
Run the following command:
python pred.py --ckpt log/conala/trans_pointerE3468L0.37-3468 --test dataset/conala/test.csv --vocab dataset/conala/vocab --vocab_size 4137 --maxlen1 200 --maxlen2 50
Run command python merge_source_target.py
to get the BLEU-4, ROUGE-L and METEOR scores.