Training baseline model (Task1)

Question

Training baseline model (Task1)

Closed this issue 3 years ago · 5 comments

Hi,

I am getting the following error when I run 'python train_baseline_task1.py'

Traceback (most recent call last):
File "train_baseline_task1.py", line 274, in
main(args)
File "train_baseline_task1.py", line 58, in main
training_predictors = pickle.load(f)
EOFError: Ran out of input

Answer 1 · 2021-04-20T09:30:04.000Z

Hello anton-jeran.
This error may indicate that you are loading an empty pickle file. You may have accidentally overwritten the pickle files that the script preprocessing.py generates.
Could you please verify that the files that you are loading are not empty? The files paths are specified as arguments in train_baseline_task1.py (look for #dataset parameters).

Answer 2 · 2021-04-20T14:41:03.000Z

Hi,

I see 2 issues

When I download datasets, some data is downloaded in Task1 folder while some are downloaded in 'Task1'$'\r'. I copied all the data in 'Task1'$'\r' to Task1.
From the above figure you can see that I removed the processed folder, and ran your comments from the beginning. But I am still getting the error. Also, pkl file is not empty.

Answer 3 · 2021-04-20T15:24:15.000Z

Issue 1: The files between apices look like incomplete downloads. The download_dataset.py script downloads the desired set in a single zip folder, then extracts it to a new folder with the same name and finally deletes the zip. Alternatively, you can download the datasets manually at this link: https://doi.org/10.5281/zenodo.4642005.

Issue 2: As I can see from your terminal, the preprocessing was killed, and this usually means that you ran out of RAM. For the same reason you get the 'Ran out of input' error, that is because the pickle matrices saved by preprocessing.py are incomplete, and then corrupted. Unfortunately, the dataset is quite big, but you can reduce the memory requirements by preprocessing only a subset of it. To do this, simply add the argument --num_data X, where X is the maximum amount of datapoints (uncut) to preprocess for each set.
For example:
python preprocessing.py --task 1 --input_path DATASETS/Task1 --training_set train100 --num_mics 1 --segmentation_len 2 --num_data 200
We will look for a workaround to reduce the memory requirements for the preprocessing.

We hope this helps!

Answer 4 · 2021-04-20T15:30:01.000Z

Thanks, Now I understood the issue :)

Answer 5 · 2021-04-20T15:31:07.000Z

Great! So I'm closing the issue.