CannyLab/tsne-cuda

Memory leak using fit_transform

DradeAW opened this issue · 4 comments

Hi,

I've been trying to use tsnecuda on my dataset, but I keep getting memory errors even though I'm using relatively small dataset.

My array is a 100000x375 of int16 (= 72MB), and I'm running the software on a RTX 2080 8GB.
When running TSNE(n_components=2).fit_transform(data), the GPU memory usage jumps from 0% to 100% in less than 2 seconds and I get the following error:

terminate called after throwing an instance of 'faiss::FaissException'
  what():  Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /faiss/faiss/gpu/StandardGpuResources.cpp:410: Error: 'err == cudaSuccess' failed: Failed to cudaMalloc 1500000000 bytes on device 1 (error 2 out of memory
Outstanding allocations:
Alloc type FlatData: 2 allocations, 475264 bytes
Alloc type TemporaryMemoryBuffer: 1 allocations, 536870912 bytes
Alloc type Other: 5 allocations, 102200000 bytes
Alloc type IVFLists: 632 allocations, 217884672 bytes

Aborted

This looks like a memory leak?
I've installed faiss and tsnecuda in conda using conda install -c CannyLab -c pytorch tsnecuda, and the test ran without a problem.
This problem happens in cuda 10.1 and cuda 10.2.
I've tried tsnecuda a few months ago (somewhere in may I believe), and it worked fine then.

Hmm - it looks like we're running out of memory in FAISS (it's unlikely to be a memory leak). Do you have multiple GPUs on this machine (with one that might be a bit smaller)? (by default tsne-cuda now tries to allocate the search on both devices)

I do have 2 GPUs (one only for video output, and the RTX 2080 for computations).

However, when I checked, all 8 GB of the RTX 2080 were being used right before the crash (and none from the other GPU), which is why I didn't think the problem came from there.

Also I tried running the same code but with 40,000 instead of 100,000 and it runs (but ideally I would like to run it with 300,000).

Can you try running the code with CUDA_VISIBLE_DEVICES=X (where X is the device identifier from nvidia-smi corresponding to the 2080)? Because we use a mirrored split, if you're not careful it will try to put the full NN map on both GPUs (regardless of memory availability)

Ah it solved the issue, thanks!

Something weird happened actually, here is the ouput of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro K620         On   | 00000000:03:00.0  On |                  N/A |
| 44%   53C    P8     1W /  30W |    591MiB /  1979MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:73:00.0 Off |                  N/A |
| 34%   39C    P8    12W / 215W |      6MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

But when I set CUDA_VISIBLE_DEVICES=1 it actually ran on the Quadro K620. I switched it to =0 and now it runs on the RTX 2080.
It now works with n=300000 (it seems to take all the memory it can, but does not crash).

Thank you!