jamesmf/cclm

logic for picking random substring in MLM pretrainer

Closed this issue · 1 comments

  • should start at the beginning of a token
  • shouldn't avoid a second encode from tokenizer

closed with 87d2927