KeyError: 'token_type_ids'
Closed this issue · 3 comments
anime26398 commented
while running Domain-Cosine Data Selection got this error in inputs:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-30-94419d0057ff> in <module>
15 print('{} {} {}'.format(domain, split, len(lines)))
16 if not os.path.exists(save_path):
---> 17 encode_text_file_and_save(file_path, save_path, max_lines_to_encode)
18 else:
19 print('already encoded, skipping...')
<ipython-input-29-b770826372b4> in encode_text_file_and_save(file_path, output_path, max_lines_to_encode)
104 input_features = convert_text_file_to_features(file_path, tokenizer,
105 max_length=128,
--> 106 max_lines_to_encode=max_lines_to_encode)
107 tensor_dataset = features_to_tensor_dataset(input_features)
108 start = time.time()
<ipython-input-29-b770826372b4> in convert_text_file_to_features(file_path, tokenizer, max_length, pad_token, pad_token_segment_id, mask_padding_with_zero, max_lines_to_encode)
33 add_special_tokens=True,
34 max_length=max_length)
---> 35 input_ids, token_type_ids = inputs["input_ids"], inputs["token_type_ids"]
36 attention_mask = [1 if mask_padding_with_zero else 0] * len(input_ids)
37 padding_length = max_length - len(input_ids)
/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in __getitem__(self, item)
228 """
229 if isinstance(item, str):
--> 230 return self.data[item]
231 elif self._encodings is not None:
232 return self._encodings[item]
KeyError: 'token_type_ids'
roeeaharoni commented
This seems like a versioning issue, try the solution here? lyuqin/HydraNet-WikiSQL#1
Ronalmoo commented
This problem is caused by call method of tokenizer class.
try return_token_type_ids=True
ex)
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer(
text=example_text,
add_special_tokens=True,
max_length=max_length,
return_token_type_ids=True
)
sanskaromar commented
Adding return_token_type_ids=True
in tokenizer worked for me.