k2-fsa/libriheavy

Some question about datasets?

Opened this issue · 2 comments

What's the difference of test-clean and test-clean large(same question about test-other)?

No difference, just larger. We guarantee that the test subsets don't have overlap books/speakers with training set, so we can't put them into training set, we don't want to waste this part of data, so release them too, in case someone want to test their models in a larger test set.

so i just need download all json file in run.sh instead of run_pipeline.sh? And large.tar in run_pipeline.sh include large.json(in run.sh) and test_clean_large.json? Test_clean has no overlap with test_clean_large?