allanj/pytorch_neural_crf

purpose of token_type_ids or segments

y1450 opened this issue · 2 comments

y1450 commented

sorry for asking such dumb question but I could not find what is the purpose token_type_ids or segment_ids? why they are stored in the features and apparently never used anywhere in the repository.
https://github.com/allanj/pytorch_neural_crf/blob/master/src/data/transformers_dataset.py#L155

Previously, it was used for BERT model.
But as it is always zero (because we have only one type of segment), and sometimes other models such as Roberta do not require that. I just keep it here in case anyone would like to use that for other purposes (like other researchers could be designing different segments).

y1450 commented

Thanks for such your quick response. I am trying to use the crf implementation and porting the dataset processing part to huggingface datasets library.