lhoyer/DAFormer

How much GPU memory is required as the default setting?

DeepHM opened this issue · 4 comments

I'm looking into some research.
I would appreciate it if you could let me know the GPU memory needed for training.

Thank you for your interest in our work. DAFormer was trained on a Nividia RTX 2080 Ti with 11 GB memory. According to the training logs, 9.7 GB of the GPU memory were utilized.

I use a RTX 2080Ti with 11 GB memory to run this code with default parameters and get an error: RuntimeError CUDA out of memory. Tried to allocate 128.00MiB(GPU 0; 10.76 Gib total capacity; 9.01 Gib already allocated; 156.00 Mib free; 9.15 Gib reversed in total by pytorch).
This machine only runs this one program, and the monitors occupies 270M graphics card memory. I am puzzled why "CUDA out of memory" appears?
When I set batch_size=1 it can train normally. It takes 8.5 hours to train 40000 iterations in the Gta->cityscapes task.

I have used a machine without display output. Maybe this makes the difference.

To reduce GPU memory consumption, you can try to share the backward pass of source and FD loss:

# Train on source images
clean_losses = self.get_model().forward_train(
    img, img_metas, gt_semantic_seg)
clean_loss, clean_log_vars = self._parse_losses(clean_losses)
log_vars.update(clean_log_vars)

# ImageNet feature distance
if self.enable_fdist:
    feat_loss, feat_log = self.calc_feat_dist(img, gt_semantic_seg, src_feat)
    log_vars.update(add_prefix(feat_log, 'src'))
    clean_loss = clean_loss + feat_loss

# Shared source backward
clean_loss.backward()
del clean_loss
if self.enable_fdist:
    del feat_loss

Thank you very much for your help, I have solved this problem under your guidance.
Thank you for the work of your team too!