Replicate results
Yifan-Gao opened this issue · 4 comments
Hi Vector,
I have some problems on replicating experimental results. Because my turing GPU does not support CUDA8, I only tried training from scratch and loading your trained models. But both methods cannot replicate results on dev set.
Dev set | Micro | Macro | B1 | B4 |
---|---|---|---|---|
E3 paper | 68.02 | 73.36 | 67.14 | 53.67 |
train from scratch | 66.43 | 72.53 | 57.88 | 41.37 |
binary model in docker | 66.43 | 72.53 | 57.1 | 40.68 |
Hi can you please confirm that the package versions are the same as those used in the docker container with the sole exception being the cuda version?
Yes, I save all package requirements from your docker environment by pip freeze > requirements.txt
, and install them in my python 3.6.5 conda environment.
Here are all packages I use. requirements.txt
By the way, I open another issue on the preprocess part. Is it a potential reason for not replicating results?
I will look at the other issue but I could not reproduce this problem with docker:
[I] vzhong@host ~/p/e3> env NV_GPU=0 docker/wrap.sh python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/save/editor.pt --verify (base)
++ id -u vzhong
++ id -g vzhong
+ nvidia-docker run --rm -v /home/vzhong/projects/e3:/opt/code -u 1007:1007 vzhong/e3 python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/sav
e/editor.pt --verify
batch: 100%|##########| 116/116 [00:08<00:00, 13.74it/s]
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
preprocessing data
loading /opt/code/retrieval_preds.json
{'bleu_1': 0.5315,
'bleu_2': 0.501,
'bleu_3': 0.4801,
'bleu_4': 0.4624,
'combined': 0.33921664,
'macro_accuracy': 0.7336,
'micro_accuracy': 0.6802}
Use device: gpu
---
Loading: tokenize
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': 'cache/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
Done loading processors!
---
{'bleu_1': 0.667,
'bleu_2': 0.6041,
'bleu_3': 0.5635,
'bleu_4': 0.5362,
'combined': 0.39335632000000004,
'macro_accuracy': 0.7336,
'micro_accuracy': 0.6802}
@vzhong Hi, I was still not able to reproduce the result using the preprocessed file from another issue https://github.com/vzhong/e3/issues/2#issuecomment-545776436.
I noticed that the batch iterations (151) during inference was different from the one reported above (116). Will it be the cause of the discrepancy?
$ NV_GPU=0 docker/wrap.sh python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/save/editor.pt --verify
++ id -u qlin
++ id -g qlin
+ nvidia-docker run --rm -v /home/qlin/workspace/e3:/opt/code -u 1082:1088 vzhong/e3 python inference.py --retrieval /opt/save/retrieval.pt --editor /opt/save/editor.pt --verify
batch: 100%|##########| 151/151 [00:10<00:00, 15.33it/s]
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
preprocessing data
loading /opt/code/retrieval_preds.json
{'bleu_1': 0.4969,
'bleu_2': 0.4596,
'bleu_3': 0.4315,
'bleu_4': 0.4057,
'combined': 0.29802722000000004,
'macro_accuracy': 0.7346,
'micro_accuracy': 0.6714}
Use device: gpu
---
Loading: tokenize
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_tagger.pt', 'pretrain_path': 'cache/en_ewt_models/en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings:
{'model_path': 'cache/en_ewt_models/en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
Done loading processors!
---
{'bleu_1': 0.6147,
'bleu_2': 0.543,
'bleu_3': 0.4942,
'bleu_4': 0.4594,
'combined': 0.33747524,
'macro_accuracy': 0.7346,
'micro_accuracy': 0.6714}