Out of memory in encode_all_items

Question

Out of memory in encode_all_items

Closed this issue a year ago · 6 comments

Hi, I am trying to train recformer on my custom datasets, I found that the code trying to encode all items into embeddings and load in gpu, however, my datasets is large, and always cause out of memory issues, do you know how to solve the problems? Much appreciate it!

Answer 1 · 2023-08-20T18:08:23.000Z

How many items do you have in your dataset? I assume you were using finetuning code and encoding all items in batches, right? Did the out-of-memory error appear because of a large batch size or large size of the item embedding table?

Answer 2 · 2023-08-20T18:47:05.000Z

Thanks for your reply.
The number of items are around 1.5 million.
Yes, I am using the fine-tuning code.
The errors appears because large size of the item embedding table, so I use cpu to convert the item_embeddings, also, I set the batch_size to 1 during the training, but it still produce out-of-memory errors:
Training: 0%| | 21/191000 [00:06<17:29:12, 3.03it/s]
Traceback (most recent call last):
File "/home/ubuntu/qhanliu/disk2/Recformer/finetune.py", line 347, in
main()
File "/home/ubuntu/qhanliu/disk2/Recformer/finetune.py", line 300, in main
train_one_epoch(model, train_loader, optimizer, scheduler, scaler, args)
File "/home/ubuntu/qhanliu/disk2/Recformer/finetune.py", line 128, in train_one_epoch
scaler.scale(loss).backward()
File "/opt/conda/envs/recformer/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/envs/recformer/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.03 GiB (GPU 2; 31.74 GiB total capacity; 27.25 GiB already allocated; 79.12 MiB free; 31.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Answer 3 · 2023-08-21T04:59:28.000Z

One suggestion is to set a shorter max length of sequence length (default is 1024. You can set it to 512 or 256). Did you use negative sampling or full softmax? If the problem persists, you can try negative sampling by setting finetune_negative_sample_size.

Answer 4 · 2023-08-21T05:10:18.000Z

Thanks for your reply. I am using full softmax, what's the suggested finetune_negative_sample_size?

Answer 5 · 2023-08-21T05:14:21.000Z

This is one of the arguments in our code (you can find it in line 172 of finetune.py). When it is larger than 0, the code will use negative sampling instead of full softmax automatically.

Answer 6 · 2023-08-21T16:41:20.000Z

Thanks! The problems have been solved.