pengbaolin/soloist

Unable to replicate multiwoz results

daxamzn opened this issue · 2 comments

I am taking the steps here https://github.com/pengbaolin/soloist/tree/main/examples/multiwoz pretty much exactly as they are written (they can't be run exactly as written, variables like "DECODED_FILE" do not exist.

I am getting this evaluation output:

python /home/ec2-user/soloist/examples/multiwoz/evaluate.py --eval_file /home/ec2-user/test.json --eval_mode test

DST: (0.0, 0, 7372)
0 0
corpus level 0.048745654027320076
test Corpus Matches : 22.00%
test Corpus Success : 2.70%
test Corpus BLEU : 0.01%
Total number of dialogues: 1000
Combined Score 0.12926763704201175

which is far less than what is expected in the runbook linked above.

Specifically the steps I take are to spin up a p3.8xlarge instance on AWS Ec2. Then run these steps:

git clone https://github.com/pengbaolin/soloist.git

cd ~/soloist

conda create -n ${EXP_NAME} python=3.6
source activate ${EXP_NAME}
pip install -r requirements.txt

\# these are needed for the multiwoz steps despite not being in the requirements.txt
pip install simplejson
pip install transformers


wget https://bapengstorage.blob.core.windows.net/soloist/gtg_pretrained.tar.gz
cd ~/soloist/soloist
tar -xvf ../gtg_pretrained.tar.gz

cd ~/soloist/examples/multiwoz
mkdir data
sh fetch_data_and_preprocessing.sh

cd ~/soloist/soloist/

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 \
--nnodes=1 \
--node_rank=0 \
--master_addr="localhost" \
--master_port=8898 soloist_train.py \
--output_dir=multiwoz_models \
--model_type=gpt2 \
--model_name_or_path=gtg_pretrained \
--do_train \
--train_data_file=../examples/multiwoz/train.soloist.json \
--eval_data_file=../examples/multiwoz/valid.soloist.json \
--add_special_action_tokens=../examples/multiwoz/resource/special_tokens.txt \
--per_gpu_train_batch_size 1 \
--num_train_epochs 50 \
--learning_rate 5e-5 \
--overwrite_cache \
--save_steps 5000 \
--max_seq 100 \
--overwrite_output_dir \
--max_turn 15 \
--num_candidates 1 \
--mc_loss_efficient 0.33 \
--add_response_prediction \
--add_same_belief_response_prediction \
--add_belief_prediction

cd /home/ec2-user/soloist/soloist/scripts

python /home/ec2-user/soloist/soloist/soloist_decode.py \
--model_type=gpt2 \
--model_name_or_path=/home/ec2-user/soloist/soloist/multiwoz_models/checkpoint-75000 \
--num_samples 5 \
--input_file=/home/ec2-user/soloist/examples/multiwoz/valid.soloist.json \
--top_p 0.5 \
--temperature 1 \
--output_file=/home/ec2-user/test.json \
--max_turn 15


python -c "import nltk; nltk.download('punkt')"

python /home/ec2-user/soloist/examples/multiwoz/evaluate.py --eval_file /home/ec2-user/test.json --eval_mode test
lqf96 commented

@daxamzn Actually there's a small mistake in the steps you posted... The decode_multiwoz.sh script defaults to do DST and NLG evaluation on the validation set (see the input_file command line option). Thus you either need to change the eval_mode option to valid, or you need to change the input_file to test.soloist.json so that decoding and evaluation are performed on the same dataset.

But I agree that I can't replicate the results the author posted in the MultiWoZ readme. The combined score I got in the end is around ~0.20... Do you work for Amazon now? If so, maybe you can send your Amazon username to my personal email address. We can then communicate on internal Slack or email about the replication efforts.

BTW @pengbaolin anything to share or clarify in the situation?

Hi @daxamzn, @lqf96. I'm trying to run the codes in the same way @daxamzn mentioned and documented in the MultiWoZ readme. Because I don't have GPU I can't facilitate torch.distributed.launch so I decided to run the code with the following command.

python3.6 ../soloist_train.py \
--output_dir=multiwoz_models \
--model_type=gpt2 \
--model_name_or_path=gtg_pretrained \
--do_train \
--train_data_file=../examples/multiwoz/train.soloist.json \
--eval_data_file=../examples/multiwoz/valid.soloist.json  \
--add_special_action_tokens=../examples/multiwoz/resource/special_tokens.txt \
--per_gpu_train_batch_size 1 \
--num_train_epochs 10 \
--learning_rate 5e-5 \
--overwrite_cache \
--save_steps 10000 \
--max_seq 100 \
--overwrite_output_dir \
--max_turn 15 \
--mc_loss_efficient 0.33 \
--num_candidates 1 \
--add_response_prediction \
--add_same_belief_response_prediction \
--add_belief_prediction

But unfortunately, I'm getting the following error. It looks like the specified pretrained model is of type GPT2LMHeadModel but should be of type GPT2DoubleHeadModel. Did I miss anything?!

2022-06-30 15:21:01.939122: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64/
2022-06-30 15:21:01.939171: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64/
2022-06-30 15:21:01.939179: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
06/30/2022 15:21:02 - WARNING - __main__ -   Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False
Traceback (most recent call last):
  File "../soloist_train.py", line 741, in <module>
    main()
  File "../soloist_train.py", line 668, in main
    cache_dir=args.cache_dir if args.cache_dir else None)
  File "/usr/local/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1889, in from_pretrained
    _fast_init=_fast_init,
  File "/usr/local/lib/python3.6/site-packages/transformers/modeling_utils.py", line 2045, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GPT2DoubleHeadsModel:
        size mismatch for multiple_choice_head.summary.weight: copying a param with shape torch.Size([2, 768]) from checkpoint, the shape in current model is torch.Size([1, 768]).
        size mismatch for multiple_choice_head.summary.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([1]).