shangjingbo1226/AutoNER

loding dataset error

Rock-L opened this issue · 4 comments

when i training the model (./autoner_train.sh), a error accured like :
Traceback (most recent call last):
File "train_partial_ner.py", line 66, in
dataset = pickle.load(open(args.eval_dataset, 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: './models/BC5CDR/encoded_data/test.pk'

where can i find the test.pk?

and i find the file './models/BC5CDR/encoded_data/' is empty , so the train_0.pk is also missed

Hmm, it looks odd to me. There is a dataset encoding step in the autoner_train.sh. Did you see that step completed successfully?

no, it not complete because out of memery when pickle loading "embedding.pk" . it too big, so that the thread is autonomous killed.
with open(args.pre_word_emb, 'rb') as f:
w_emb = pickle.load(f)
i find this error but have no solution

I see. If that's the case, one solution is to find a machine with a bigger memory. Another solution is to do one round of word embedding filtering, i.e., remove all embeddings of the words never appear in the corpus.

Yes, we try to keep all pre-trained word embedding (in order to ensure the resulting model to be as powerful as possible). But this is not necessary for training.