ValueError: Caught ValueError in DataLoader worker process 0.

Question

ValueError: Caught ValueError in DataLoader worker process 0.

debvrat opened this issue 4 years ago · 2 comments

Hi,

I am running the same code on the CLEVR dataset. Note that I am using the original dataset (v1.0), and not CLEVR6.

I have configured the code to run on 'cpu', because I am using it on Mac OS 10.15.3 (no CUDA).
Other environment details are:
conda 4.8.2, python 3.7.6 and pytorch 1.4.0

This is just the initial process, where I kept the dataset in the required folder and executed the command -
python tools/train_net.py --config-file configs/clevr6_prop.yaml

Following is my error trace.
Any clue of what's happening here?

No checkpoint found.
Start training
Traceback (most recent call last):
  File "tools/train_net.py", line 90, in <module>
    train_net(cfg)
  File "tools/train_net.py", line 83, in train_net
    evaluator=evaluator
  File "./lib/engine/train.py", line 48, in train
    for iter, data in enumerate(dataloader):
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "./lib/data/clevr.py", line 25, in __getitem__
    img = io.imread(img_path)[:, :, :3]
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/skimage/io/_plugins/imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/imageio/core/functions.py", line 264, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/Users/akv/anaconda3/lib/python3.7/site-packages/imageio/core/functions.py", line 182, in get_reader
    "Could not find a format to read the specified file " "in mode %r" % mode
ValueError: Could not find a format to read the specified file in mode 'i'

Answer 1 · 2020-03-17T00:39:03.000Z

When I run it on a PC with CUDA, my error trace is a little different, as follows:

No checkpoint found.
Start training
Traceback (most recent call last):
  File "tools/train_net.py", line 90, in <module>
    train_net(cfg)
  File "tools/train_net.py", line 83, in train_net
    evaluator=evaluator
  File ".\lib\engine\train.py", line 48, in train
    for iter, data in enumerate(dataloader):
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File ".\lib\data\clevr.py", line 25, in __getitem__
    img = io.imread(img_path)[:, :, :3]
  File "C:\Anaconda3\lib\site-packages\skimage\io\_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "C:\Anaconda3\lib\site-packages\skimage\io\manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\skimage\io\_plugins\imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "C:\Anaconda3\lib\site-packages\imageio\core\functions.py", line 264, in imread
    reader = read(uri, format, "i", **kwargs)
  File "C:\Anaconda3\lib\site-packages\imageio\core\functions.py", line 182, in get_reader
    "Could not find a format to read the specified file " "in mode %r" % mode
ValueError: Could not find a format to read the specified file in mode 'i'

Answer 2 · 2020-03-19T16:49:11.000Z

I fixed the first half of this issue by setting _C.DATALOADER.NUM_WORKERS = 0 in lib/config/defaults.py
[My device is 'cpu']

The second part of the issue still exists.