HarikalarKutusu/cv-tbox-split-maker

[FR] Add one more splitting algorithm for testing

Closed this issue · 1 comments

nv: seNtences-first w. unique Voices

  • Sort by unique sentences recording count, distribute to test, dev, rest to train, 80-10-10%
  • One voice only in one split
  • Ensure voice diversity 25-25-50% as in v1 algorithm

Most probably there will be cases that this algorithm will fail to use the whole dataset as it tries to enforce both sentence and voice diversity.

We added two other (vw & vx), but will not ad the one above.