prdwb/bert_hae

Errow when running

Alethx opened this issue · 7 comments

When I ran the hae.py file, after about 20 mins, I got this:
Traceback (most recent call last):
File "hae.py", line 104, in
with open(features_fname, 'wb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/home/lef/Desktop/bert_output/cache/quac/train_features_False_11.pkl'.

There was no genration of pkl files in the cache folder. I have checked the dir, and there was nothing wrong. Wondering why this happened.

I am running this with python3.7, cuda 10 and tf-gpu 1.13.

prdwb commented

Could you confirm the directory "/home/lef/Desktop/bert_output/cache/quac/" exists? If not, we need to manually make the directory. Thanks.

Thanks for your reply.
I changed the dir, but still no pkl files were generated.
I checked the dir of dataset, model, output and cache. There was nothing wrong.
When running the model, it showed:

bert_config_file : /home/lef/Desktop/bert/bert_config.json
vocab_file : /home/lef/Desktop/bert/vocab.txt
output_dir : /home/lef/Desktop/bert_output
quac_train_file : /home/lef/Desktop/train_v0.2.json
quac_predict_file : /home/lef/Desktop/val_v0.2.json
init_checkpoint : /home/lef/Desktop/bert/bert_model.ckpt
do_lower_case : True
max_seq_length : 384
doc_stride : 128
max_query_length : 64
do_train : True
do_predict : True
train_batch_size : 6
predict_batch_size : 6
learning_rate : 3e-05
num_train_epochs : 2.0
warmup_proportion : 0.1
save_checkpoints_steps : 1000
evaluation_steps : 5
evaluate_after : 0
iterations_per_loop : 1000
n_best_size : 4
max_answer_length : 30
use_tpu : False
tpu_name : None
tpu_zone : None
gcp_project : None
master : None
num_tpu_cores : 8
verbose_logging : False
history : 6
load_small_portion : True
dataset : quac
cache_dir : /home/lef/Desktop/cache/quac/
max_considered_history_turns : 11
train_steps : 20
attempting to load train features from cache
train feature cache does not exist, generating

However, no pkl generated.
What's wrong with it?

prdwb commented

It shouldn't be so slow since "load_small_portion" is on. But since no error message is printed, I would recommend you to give it more time. Thanks.

Error still exits.

Traceback (most recent call last):
File "hae.py", line 88, in
with open(features_fname, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/home/lef/Desktop/cache/quac/quac/train_features_False_11.pkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "hae.py", line 104, in
with open(features_fname, 'wb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/home/lef/Desktop/cache/quac/quac/train_features_False_11.pkl'

Seemed that the program cannot establish folder or files in the cache/quac.

prdwb commented

Yes. The program cannot make new directories. We need to manually make the directory "/home/lef/Desktop/cache/quac/quac/" by mkdir. Thanks.

prdwb commented

If you could follow the following steps to resolve the issue:

  1. Pass in "/home/lef/Desktop/cache/" for cache_dir
  2. Manually create the directory "/home/lef/Desktop/cache/quac/"

That would be it. Sorry for the confusion.