purpose of token_type_ids or segments

Question

purpose of token_type_ids or segments

y1450 opened this issue 2 years ago · 2 comments

sorry for asking such dumb question but I could not find what is the purpose token_type_ids or segment_ids? why they are stored in the features and apparently never used anywhere in the repository.
https://github.com/allanj/pytorch_neural_crf/blob/master/src/data/transformers_dataset.py#L155

Answer 1 · 2022-09-13T14:54:12.000Z

Previously, it was used for BERT model.
But as it is always zero (because we have only one type of segment), and sometimes other models such as Roberta do not require that. I just keep it here in case anyone would like to use that for other purposes (like other researchers could be designing different segments).

Answer 2 · 2022-09-13T14:58:19.000Z

Thanks for such your quick response. I am trying to use the crf implementation and porting the dataset processing part to huggingface datasets library.