Memory leak using fit_transform

Question

Memory leak using fit_transform

DradeAW opened this issue 3 years ago · 4 comments

Hi,

I've been trying to use tsnecuda on my dataset, but I keep getting memory errors even though I'm using relatively small dataset.

My array is a 100000x375 of int16 (= 72MB), and I'm running the software on a RTX 2080 8GB.
When running TSNE(n_components=2).fit_transform(data), the GPU memory usage jumps from 0% to 100% in less than 2 seconds and I get the following error:

terminate called after throwing an instance of 'faiss::FaissException'
  what():  Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /faiss/faiss/gpu/StandardGpuResources.cpp:410: Error: 'err == cudaSuccess' failed: Failed to cudaMalloc 1500000000 bytes on device 1 (error 2 out of memory
Outstanding allocations:
Alloc type FlatData: 2 allocations, 475264 bytes
Alloc type TemporaryMemoryBuffer: 1 allocations, 536870912 bytes
Alloc type Other: 5 allocations, 102200000 bytes
Alloc type IVFLists: 632 allocations, 217884672 bytes

Aborted

This looks like a memory leak?
I've installed faiss and tsnecuda in conda using conda install -c CannyLab -c pytorch tsnecuda, and the test ran without a problem.
This problem happens in cuda 10.1 and cuda 10.2.
I've tried tsnecuda a few months ago (somewhere in may I believe), and it worked fine then.

Answer 1 · 2021-10-11T16:18:14.000Z

Hmm - it looks like we're running out of memory in FAISS (it's unlikely to be a memory leak). Do you have multiple GPUs on this machine (with one that might be a bit smaller)? (by default tsne-cuda now tries to allocate the search on both devices)

Answer 2 · 2021-10-11T16:21:22.000Z

I do have 2 GPUs (one only for video output, and the RTX 2080 for computations).

However, when I checked, all 8 GB of the RTX 2080 were being used right before the crash (and none from the other GPU), which is why I didn't think the problem came from there.

Also I tried running the same code but with 40,000 instead of 100,000 and it runs (but ideally I would like to run it with 300,000).

Answer 3 · 2021-10-11T16:24:31.000Z

Can you try running the code with CUDA_VISIBLE_DEVICES=X (where X is the device identifier from nvidia-smi corresponding to the 2080)? Because we use a mirrored split, if you're not careful it will try to put the full NN map on both GPUs (regardless of memory availability)

Answer 4 · 2021-10-12T08:12:55.000Z

Ah it solved the issue, thanks!

Something weird happened actually, here is the ouput of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro K620         On   | 00000000:03:00.0  On |                  N/A |
| 44%   53C    P8     1W /  30W |    591MiB /  1979MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:73:00.0 Off |                  N/A |
| 34%   39C    P8    12W / 215W |      6MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

But when I set CUDA_VISIBLE_DEVICES=1 it actually ran on the Quadro K620. I switched it to =0 and now it runs on the RTX 2080.
It now works with n=300000 (it seems to take all the memory it can, but does not crash).

Thank you!