load data error
hxp1024 opened this issue · 5 comments
Hi Allan,
i need your help,
i run this:
python transformers_trainer.py --device=cuda:0 --dataset=conll2003_sample --model_folder=saved_models --embedder_type=roberta-base
then:
09/15/2022 19:48:30 - INFO - main - device: cuda:0
09/15/2022 19:48:30 - INFO - main - seed: 42
09/15/2022 19:48:30 - INFO - main - dataset: conll2003_sample
09/15/2022 19:48:30 - INFO - main - optimizer: adamw
09/15/2022 19:48:30 - INFO - main - learning_rate: 2e-05
09/15/2022 19:48:30 - INFO - main - momentum: 0.0
09/15/2022 19:48:30 - INFO - main - l2: 1e-08
09/15/2022 19:48:30 - INFO - main - lr_decay: 0
09/15/2022 19:48:30 - INFO - main - batch_size: 30
09/15/2022 19:48:30 - INFO - main - num_epochs: 100
09/15/2022 19:48:30 - INFO - main - train_num: -1
09/15/2022 19:48:30 - INFO - main - dev_num: -1
09/15/2022 19:48:30 - INFO - main - test_num: -1
09/15/2022 19:48:30 - INFO - main - max_no_incre: 80
09/15/2022 19:48:30 - INFO - main - max_grad_norm: 1.0
09/15/2022 19:48:30 - INFO - main - fp16: 0
09/15/2022 19:48:30 - INFO - main - model_folder: saved_models
09/15/2022 19:48:30 - INFO - main - hidden_dim: 0
09/15/2022 19:48:30 - INFO - main - dropout: 0.5
09/15/2022 19:48:30 - INFO - main - embedder_type: roberta-base
09/15/2022 19:48:30 - INFO - main - add_iobes_constraint: 0
09/15/2022 19:48:30 - INFO - main - print_detail_f1: 0
09/15/2022 19:48:30 - INFO - main - earlystop_atr: micro
09/15/2022 19:48:30 - INFO - main - mode: train
09/15/2022 19:48:30 - INFO - main - test_file: data/conll2003_sample/test.txt
09/15/2022 19:48:30 - INFO - main - [Data Info] Tokenizing the instances using 'roberta-base' tokenizer
Ignored unknown kwargs option trim_offsets
09/15/2022 19:48:35 - INFO - main - [Data Info] Reading dataset from:
data/conll2003_sample/train.txt
data/conll2003_sample/dev.txt
data/conll2003_sample/test.txt
09/15/2022 19:48:35 - INFO - src.data.transformers_dataset - [Data Info] Reading file: data/conll2003_sample/train.txt, labels will be converted to IOBES encoding
09/15/2022 19:48:35 - INFO - src.data.transformers_dataset - [Data Info] Modify src/data/transformers_dataset.read_txt function if you have other requirements
100%|████████████████████████████████████████████████████| 79/79 [00:00<00:00, 147990.18it/s]
09/15/2022 19:48:35 - INFO - src.data.transformers_dataset - number of sentences: 5
09/15/2022 19:48:35 - INFO - src.data.transformers_dataset - [Data Info] Using the training
set to build label index
09/15/2022 19:48:35 - INFO - src.data.data_utils - #labels: 11
09/15/2022 19:48:35 - INFO - src.data.data_utils - label 2idx: {'': 0, 'S-ORG': 1, 'O': 2, 'S-MISC': 3, 'B-PER': 4, 'E-PER': 5, 'S-LOC': 6, 'B-ORG': 7, 'E-ORG': 8, '': 9, '': 10}
09/15/2022 19:48:35 - INFO - src.data.transformers_dataset - [Data Info] We are not limiting the max length in tokenizer. You should be aware of that
Traceback (most recent call last):
File "D:\Anaconda3\envs\py37torch17\lib\site-packages\transformers\tokenization_utils_base.py", line 245, in getattr
return self.data[item]
KeyError: 'word_ids'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "transformers_trainer.py", line 260, in
main()
File "transformers_trainer.py", line 223, in main
train_dataset = TransformersNERDataset(conf.train_file, tokenizer, number=conf.train_num,
is_train=True)
File "D:\pyProj\pytorch_neural_crf-master\src\data\transformers_dataset.py", line 95, in init
self.insts_ids = convert_instances_to_feature_tensors(insts, tokenizer, label2idx)
File "D:\pyProj\pytorch_neural_crf-master\src\data\transformers_dataset.py", line 38, in convert_instances_to_feature_tensors
subword_idx2word_idx = res.word_ids(batch_index=0)
File "D:\Anaconda3\envs\py37torch17\lib\site-packages\transformers\tokenization_utils_base.py", line 247, in getattr
raise AttributeError
AttributeError
i think some thing cant run in pytorch_neural_crf-master\src\data\transformers_dataset.py
i try some times, i dont solve this, can you help me? thank you
Can you let me know yor PyTorch version and Transformers' version
pytorch 1.7.1 py3.7_cpu_0 [cpuonly] pytorch
ransformers 3.4.0 pypi_0 pypi
You Probably need much latest version of transformers. As I always keep updating that based on the latest version. You probably need at least 4.10 version of transformers
Let me know if you can't do so (for some reasons). I can try to find some previous commits that you might be able to work with.
For example, you can work with this commit:
39c7fe6
it could work but I don't guarantee for sure
Solved the issue by upgrade transformers, thank you