[BUG] Jaccard Shuffle error if shuffled_docs.parquet data already exists and has been written.
ayushdg opened this issue · 0 comments
ayushdg commented
Describe the bug
Calling jaccard_shuffle on an output directory that already contains shuffle docs from a previous run leads to errors
assert bucket_part_start_offset % parts_per_bucket_batch == 0
AssertionError