E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.96G (3179741184 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

Do you have any tips for fixing this issue? How can I change the batch size in your code?

(vin_old_tf) mona@goku:~/research/code/GAN-fall/Fall-detection/mrfd$ jupyter notebook demo.ipynb 
[I 10:39:07.568 NotebookApp] Serving notebooks from local directory: /home/mona/research/code/GAN-fall/Fall-detection/mrfd
[I 10:39:07.568 NotebookApp] The Jupyter Notebook is running at:
[I 10:39:07.568 NotebookApp] http://localhost:8888/?token=07f1a05a3a7811cebb6f22c06c2f66db66f88c7bd8bb7529
[I 10:39:07.568 NotebookApp]  or http://127.0.0.1:8888/?token=07f1a05a3a7811cebb6f22c06c2f66db66f88c7bd8bb7529
[I 10:39:07.568 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 10:39:07.586 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///home/mona/.local/share/jupyter/runtime/nbserver-16319-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=07f1a05a3a7811cebb6f22c06c2f66db66f88c7bd8bb7529
     or http://127.0.0.1:8888/?token=07f1a05a3a7811cebb6f22c06c2f66db66f88c7bd8bb7529
[W 10:39:09.420 NotebookApp] Notebook demo.ipynb is not trusted
[I 10:39:10.281 NotebookApp] Kernel started: 7c33aac5-336f-46b3-8751-13fb39130b91
2021-03-10 10:39:41.075299: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2021-03-10 10:39:41.108014: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599990000 Hz
2021-03-10 10:39:41.108954: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561ecffe5dc0 executing computations on platform Host. Devices:
2021-03-10 10:39:41.109122: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2021-03-10 10:39:41.114885: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2021-03-10 10:39:42.712191: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-10 10:39:42.712603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1650 Ti with Max-Q Design major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2021-03-10 10:39:42.713963: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2021-03-10 10:39:42.731762: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2021-03-10 10:39:42.740063: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2021-03-10 10:39:42.742693: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2021-03-10 10:39:42.761787: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2021-03-10 10:39:42.774385: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2021-03-10 10:39:42.815004: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2021-03-10 10:39:42.815323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-10 10:39:42.816481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-10 10:39:42.817355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2021-03-10 10:39:42.817815: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2021-03-10 10:39:42.904318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-10 10:39:42.904382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2021-03-10 10:39:42.904397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2021-03-10 10:39:42.904833: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-10 10:39:42.906003: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-10 10:39:42.906957: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-10 10:39:42.907767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3545 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-03-10 10:39:42.911276: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561ecfe20c70 executing computations on platform CUDA. Devices:
2021-03-10 10:39:42.911316: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1650 Ti with Max-Q Design, Compute Capability 7.5
2021-03-10 10:39:51.327981: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2021-03-10 10:39:52.471137: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 2.96G (3179741184 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
^C[I 10:40:01.516 NotebookApp] interrupted
Serving notebooks from local directory: /home/mona/research/code/GAN-fall/Fall-detection/mrfd
1 active kernel
The Jupyter Notebook is running at:
http://localhost:8888/?token=07f1a05a3a7811cebb6f22c06c2f66db66f88c7bd8bb7529
 or http://127.0.0.1:8888/?token=07f1a05a3a7811cebb6f22c06c2f66db66f88c7bd8bb7529
Shutdown this notebook server (y/[n])? y
[C 10:40:02.982 NotebookApp] Shutdown confirmed
[I 10:40:02.982 NotebookApp] Shutting down 1 kernel
[I 10:40:08.256 NotebookApp] Kernel shutdown: 7c33aac5-336f-46b3-8751-13fb39130b91
(vin_old_tf) mona@goku:~/research/code/GAN-fall/Fall-detection/mrfd$

Things you can try:-

If you run all the cells, the object detector is also loaded at the beginning which is not required during prediction. You can save the tracking results, restart the kernel and load the tracking results. There is a dummy code written for that scenerio. Uncomment and modify the last second cell of the person tracking section.
In the demo file, the prediction is done in the 'get_T_S_RE_all_agg' method of the 'fusiondiffroigan' class. Exact line-

Fall-detection/mrfd/trainer/fusiondiffroigan.py

Line 159 in 4fec0b1

recons_seq = model.predict([thermal_data,thermal_masks,diff_masks]) #(samples-win_length+1, win_length, wd,ht,1)

You can create batches before that line and merge the results after it.

actually very weird to say that even though I got cuda out of memory, I ran it again after killing the notebook and still it worked. If it worked, should I still do the things you mentioned?

actually I was gonna create a separate issue and ask if there is a need to uncomment the commented block of code you have an arrow pointing to it? or when should this be done?

My GPU this time was at a usage of 88% but worked.