fauxneticien/w2v2-10min-exps

Don't subset dev data

fauxneticien opened this issue · 0 comments

In helpers/utils.py the following condition subsets both train and dev

if 'subset_train' in data_config:

    df = df.sample(frac=1, random_state=data_config['subset_train']['seed']).copy().reset_index(drop=True)
    df = df[ df['audio'].apply(lambda s: len(s)/16_000).cumsum() <= (60 * data_config['subset_train']['mins']) ].copy().reset_index(drop=True)

To have the subsetting only apply to the training data, update to:

if 'subset_train' in data_config and tsv_file ==train_tsv’: