facebookresearch/voxpopuli

get_asr_data script fails

Robotuks opened this issue · 3 comments

Hello, I am trying to download ASR data by following the steps:

  1. Downloading all data:
    $ python -m voxpopuli.download_audios --root ./ --subset asr

  2. And then trying to segment the data but getting the following error:

$ python -m voxpopuli.get_asr_data --root ./ --lang en
100%|█████████████████████████████████████████████████████| 412484/412484 [00:10<00:00, 37934.15it/s]
 29%|████████████████▉                                         | 1188/4068 [47:02<1:54:02,  2.38s/it]
Traceback (most recent call last):
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/data/1d_projects/voxpopuli/voxpopuli/get_asr_data.py", line 104, in <module>
    main()
  File "/media/data/1d_projects/voxpopuli/voxpopuli/get_asr_data.py", line 100, in main
    get(args)
  File "/media/data/1d_projects/voxpopuli/voxpopuli/get_asr_data.py", line 70, in get
    multiprocess_run(items, cut_session, n_workers=3)
  File "/media/data/1d_projects/voxpopuli/voxpopuli/utils.py", line 14, in multiprocess_run
    process_map(func, a_list, max_workers=n_workers, chunksize=1)
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/media/data/anaconda3_linux/envs/voxpopuli/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I am using Ubuntu 20.04. Maybe someone has an idea what can be wrong?

kahne commented

Thanks for reporting the issue!

Is this resolved? I tested with the latest code but couldn't reproduce this error. Do you mind trying again?

kahne commented

I will close this issue for now. Please feel free to reopen it or file a new one if you still need help. Thanks!

sorry for the delayed response.
Tried again from the start and it worked this time. Maybe some files got corrupted or something.
Thanks for a quick response