vzhong/e3

Replicate results

Yifan-Gao opened this issue · 4 comments

Hi Vector,
I have some problems on replicating experimental results. Because my turing GPU does not support CUDA8, I only tried training from scratch and loading your trained models. But both methods cannot replicate results on dev set.

Dev set Micro Macro B1 B4
E3 paper 68.02 73.36 67.14 53.67
train from scratch 66.43 72.53 57.88 41.37
binary model in docker 66.43 72.53 57.1 40.68

Hi can you please confirm that the package versions are the same as those used in the docker container with the sole exception being the cuda version?

Yes, I save all package requirements from your docker environment by pip freeze > requirements.txt, and install them in my python 3.6.5 conda environment.
Here are all packages I use. requirements.txt

By the way, I open another issue on the preprocess part. Is it a potential reason for not replicating results?

I will look at the other issue but I could not reproduce this problem with docker:

[I] vzhong@host ~/p/e3> env NV_GPU=0 docker/wrap.sh python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/save/editor.pt --verify     (base)
++ id -u vzhong
++ id -g vzhong
+ nvidia-docker run --rm -v /home/vzhong/projects/e3:/opt/code -u 1007:1007 vzhong/e3 python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/sav
e/editor.pt --verify
batch: 100%|##########| 116/116 [00:08<00:00, 13.74it/s]
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
preprocessing data
loading /opt/code/retrieval_preds.json
{'bleu_1': 0.5315,
 'bleu_2': 0.501,
 'bleu_3': 0.4801,
 'bleu_4': 0.4624,
 'combined': 0.33921664,
 'macro_accuracy': 0.7336,
 'micro_accuracy': 0.6802}
Use device: gpu
---
Loading: tokenize
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': 'cache/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
Done loading processors!
---
{'bleu_1': 0.667,
 'bleu_2': 0.6041,
 'bleu_3': 0.5635,
 'bleu_4': 0.5362,
 'combined': 0.39335632000000004,
 'macro_accuracy': 0.7336,
 'micro_accuracy': 0.6802}

@vzhong Hi, I was still not able to reproduce the result using the preprocessed file from another issue https://github.com/vzhong/e3/issues/2#issuecomment-545776436.

I noticed that the batch iterations (151) during inference was different from the one reported above (116). Will it be the cause of the discrepancy?

$ NV_GPU=0 docker/wrap.sh python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/save/editor.pt --verify
++ id -u qlin
++ id -g qlin
+ nvidia-docker run --rm -v /home/qlin/workspace/e3:/opt/code -u 1082:1088 vzhong/e3 python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/save/editor.pt --verify
batch: 100%|##########| 151/151 [00:10<00:00, 15.33it/s]
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
preprocessing data
loading /opt/code/retrieval_preds.json
{'bleu_1': 0.4969,
 'bleu_2': 0.4596,
 'bleu_3': 0.4315,
 'bleu_4': 0.4057,
 'combined': 0.29802722000000004,
 'macro_accuracy': 0.7346,
 'micro_accuracy': 0.6714}
Use device: gpu
---
Loading: tokenize
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': 'cache/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
Done loading processors!
---
{'bleu_1': 0.6147,
 'bleu_2': 0.543,
 'bleu_3': 0.4942,
 'bleu_4': 0.4594,
 'combined': 0.33747524,
 'macro_accuracy': 0.7346,
 'micro_accuracy': 0.6714}