In the crf, why you add 2 to the tagset_size?

Question

In the crf, why you add 2 to the tagset_size?

lvjiujin opened this issue 3 years ago · 1 comments

in the following code, I can't understand. Why you add 2 to the self.tagset_size, what's the START_TAG or STOP_TAG?

        # We add 2 here, because of START_TAG and STOP_TAG  

        # transitions (f_tag_size, t_tag_size), transition value from f_tag to t_tag

        init_transitions = torch.zeros(self.tagset_size + 2, self.tagset_size + 2)

which results in the following error:

File "E:/Paper/NER/LEBERT_original_code/LEBERT/Trainer.py", line 602, in <module>
    main()
  File "E:/Paper/NER/LEBERT_original_code/LEBERT/Trainer.py", line 579, in main
    train(model, args, train_dataset, dev_dataset, test_dataset, label_vocab, tb_writer)
  File "E:/Paper/NER/LEBERT_original_code/LEBERT/Trainer.py", line 382, in train
    metrics, _ = evaluate(model, args, test_dataset, label_vocab, global_step, description="Test", write_file=True)
  File "E:/Paper/NER/LEBERT_original_code/LEBERT/Trainer.py", line 463, in evaluate
    acc, p, r, f1, all_true_labels, all_pred_labels = seq_f1_with_mask(
  File "E:\Paper\NER\LEBERT_original_code\LEBERT\function\metrics.py", line 37, in seq_f1_with_mask
    tmp_pred.append(label_vocab.convert_id_to_item(all_pred_labels[i][j]).replace("M-", "I-"))
  File "E:\Paper\NER\LEBERT_original_code\LEBERT\feature\vocab.py", line 81, in convert_id_to_item
    return self.idx2item[id]
IndexError: list index out of range

When I modified it to the init_transitions = torch.zeros(self.tagset_size , self.tagset_size ) , it is ok.

Answer 1 · 2021-10-03T07:51:14.000Z

You should know more about crf!