google-research/deduplicate-text-datasets

one bug when I use

flyingwaters opened this issue · 2 comments

one bug when I use

the line 1175/28 of main.rs "outputs[which_array][index+1]" actually should be "outputs[which_array][index]". If you take the real length of deplicate sentence as the param "--length-bytes", there would be one out of bound bug. I find maybe the code "index+1" cause it. ~! just a little error , project is good.

Thanks! I'll take a look later this week.