purpose of token_type_ids or segments
y1450 opened this issue · 2 comments
sorry for asking such dumb question but I could not find what is the purpose token_type_ids or segment_ids? why they are stored in the features and apparently never used anywhere in the repository.
https://github.com/allanj/pytorch_neural_crf/blob/master/src/data/transformers_dataset.py#L155
Previously, it was used for BERT model.
But as it is always zero (because we have only one type of segment), and sometimes other models such as Roberta do not require that. I just keep it here in case anyone would like to use that for other purposes (like other researchers could be designing different segments).
Thanks for such your quick response. I am trying to use the crf implementation and porting the dataset processing part to huggingface datasets library.