GPU Memory Allocation with POT + pytorch

Question

GPU Memory Allocation with POT + pytorch

Closed this issue 5 months ago · 2 comments

Describe the bug

In version 0.9.1 (haven't checked other versions), when using torch distributed data parallel, importing POT allocates memory on GPU:0 for all processes. e.g. if DDP is running 4 ways then there are 4 extra allocations of memory on GPU:0 of ~800MB even when not using any torch-backed POT functions.

This was a very difficult bug to find as I did not expect importing a package where I'm not using any torch functionality to allocate GPU memory. Why does POT need to allocate GPU memory when using a backend? Even if using a backend actively, I would prefer a switch between using CPU and GPU if for instance I need that extra memory for model / data.

This is partially fixed by PR #520 and setting an environment variable, but I would greatly prefer it if either this did not happen, or at least gives a warning or message like POT is using {PACKAGE} backend: allocating GPU memory, set {XXX} to disable.

Related to issues #516 #382 and PR #520

To Reproduce

import torch
import pot when using DDP training in pytorch.

Expected behavior

Make it clear when POT is allocating GPU memory, only allocate when necessary, and attempt to allocate on the correct device when necessary.

Answer 1 · 2023-09-19T12:49:08.000Z

Hello, we are working on a fix here:
#520

maybe you could try it and tell us if it works for you

Answer 2 · 2024-02-11T20:51:11.000Z

Bug fixed in POT 0.9.2, thus i close this issue.