juglab/EmbedSeg

[BUG] `workers` Parameter not Respected by DataLoaders

authman opened this issue · 1 comments

Describe the bug
Only 1 thread (core) is used for the dataloaders.

To Reproduce
Steps to reproduce the behavior:

  1. Spin up any of the training examples
  2. Set batch_size to something respectable, like 512
  3. Adjust workers dataloader parameter
  4. Examine CPU utilization

Expected behavior
Multiple cores get engaged and are used to feed the GPU(s).

Screenshots
Only 1 CPU Core Engaged

Desktop (please complete the following information):

  • OS: Ubuntu 20.04.2 LTS
  • Graphics 2x GeForce GTX 3090

Additional context

train_dataset_dict = create_dataset_dict(
	data_dir = data_dir, 
	project_name = project_name,  
	center = center, 
	size = train_size, 
	batch_size = train_batch_size, 
	virtual_batch_multiplier = virtual_train_batch_multiplier, 
	normalization_factor= normalization_factor,
	one_hot = one_hot,
	workers=16,
	type = 'train'
)

To help debug, from the same virtual environment I put together this dummy script:

import random
import numpy as np
from torch.utils.data import Dataset
import torch
from tqdm.auto import tqdm

class TestDS(Dataset):
    def __len__(self):
        return 5000

    def __getitem__(self, index):
        z = np.zeros((256*256))
        for i in range(256*256): z[i] = i
        return z
        

val_dataset = TestDS()
val_dataset_it = torch.utils.data.DataLoader(
    val_dataset,
    batch_size=32,
    shuffle=True,
    drop_last=True,
    num_workers=12,
    pin_memory=True
)

while True:
    for i, sample in enumerate(tqdm(val_dataset_it)):
        sample = sample.to('cuda:1')

Running the above results in proper core utilization:
Cores Properly Engaged

Even adding the following code at the head of EmbSeg training script does not help:

import os
os.environ["MKL_NUM_THREADS"] = "20"
os.environ["OMP_NUM_THREADS"] = "20"

Closing bug, looks like the issue is the dual for loops in the loss function.