[BUG] `workers` Parameter not Respected by DataLoaders
authman opened this issue · 1 comments
authman commented
Describe the bug
Only 1 thread (core) is used for the dataloaders.
To Reproduce
Steps to reproduce the behavior:
- Spin up any of the training examples
- Set
batch_size
to something respectable, like 512 - Adjust
workers
dataloader parameter - Examine CPU utilization
Expected behavior
Multiple cores get engaged and are used to feed the GPU(s).
Desktop (please complete the following information):
- OS: Ubuntu 20.04.2 LTS
- Graphics 2x GeForce GTX 3090
Additional context
train_dataset_dict = create_dataset_dict(
data_dir = data_dir,
project_name = project_name,
center = center,
size = train_size,
batch_size = train_batch_size,
virtual_batch_multiplier = virtual_train_batch_multiplier,
normalization_factor= normalization_factor,
one_hot = one_hot,
workers=16,
type = 'train'
)
To help debug, from the same virtual environment I put together this dummy script:
import random
import numpy as np
from torch.utils.data import Dataset
import torch
from tqdm.auto import tqdm
class TestDS(Dataset):
def __len__(self):
return 5000
def __getitem__(self, index):
z = np.zeros((256*256))
for i in range(256*256): z[i] = i
return z
val_dataset = TestDS()
val_dataset_it = torch.utils.data.DataLoader(
val_dataset,
batch_size=32,
shuffle=True,
drop_last=True,
num_workers=12,
pin_memory=True
)
while True:
for i, sample in enumerate(tqdm(val_dataset_it)):
sample = sample.to('cuda:1')
Running the above results in proper core utilization:
Even adding the following code at the head of EmbSeg training script does not help:
import os
os.environ["MKL_NUM_THREADS"] = "20"
os.environ["OMP_NUM_THREADS"] = "20"
authman commented
Closing bug, looks like the issue is the dual for loops in the loss function.