AccelerateHS/accelerate

Space leak in LRU cache

jfennick opened this issue · 1 comments

WarGame-exe.pdf

I am submitting a...

  • bug report
  • feature request
  • support request => you might also like to ask your question on the mailing list or gitter chat.

Description

It appears that there is a space leak in withRemote/updateTask and/or incCount (see attached heap profile). Also, I have enabled ekg and acc.gc.current_bytes_remote increases with time whereas acc.gc.num_lru_evict remains zero. Unfortunately, I do not yet have a minimal example.

Expected behaviour

Current behaviour

Possible solution (optional)

Since updateTask decrements the use count and incCount increments it, I suspect that this is a lazy accumulator issue. It probably just needs a strictness annotation somewhere.

Steps to reproduce (for bugs)

Your environment

  • Accelerate version: 1.2.0.1
  • Accelerate backend(s) used: accelerate-llvm-ptx
  • GHC version: 8.6.5
  • Operating system and version: ubuntu 18.04
  • Link to your project/example: closed source
  • If this is a bug with the GPU backend, include the output of nvidia-device-query:

CUDA device query (Driver API, statically linked)
CUDA driver version 10.2
CUDA API version 10.1
Detected 1 CUDA capable device

Device 0: GeForce GTX 1060 6GB
CUDA capability: 6.1
CUDA cores: 1280 cores in 10 multiprocessors (128 cores/MP)
Global memory: 6 GB
Constant memory: 64 kB
Shared memory per block: 48 kB
Registers per block: 65536
Warp size: 32
Maximum threads per multiprocessor: 2048
Maximum threads per block: 1024
Maximum grid dimensions: 2147483647 x 65535 x 65535
Maximum block dimensions: 1024 x 1024 x 64
GPU clock rate: 1.7845 GHz
Memory clock rate: 4.004 GHz
Memory bus width: 192-bit
L2 cache size: 2 MB
Maximum texture dimensions
1D: 131072
2D: 131072 x 65536
3D: 16384 x 16384 x 16384
Texture alignment: 512 B
Maximum memory pitch: 2 GB
Concurrent kernel execution: Yes
Concurrent copy and execution: Yes, with 2 copy engines
Runtime limit on kernel execution: Yes
Integrated GPU sharing host memory: No
Host page-locked memory mapping: Yes
ECC memory support: No
Unified addressing (UVA): Yes
Single to double precision performance: 32 : 1
Supports compute pre-emption: Yes
Supports cooperative launch: Yes
Supports multi-device cooperative launch: Yes
PCI bus/location: 6/0
Compute mode: Default
Multiple contexts are allowed on the device simultaneously

For what it's worth, I've been using near-HEAD of accelerate, accelerate-llvm, accelerate-llvm-ptx on many machines in production, both x86 and aarch64, continuously running many kernels simultaneously for weeks without memory leaks. I think this suggests that either:

  • This has been fixed post 1.2.0.1.
  • This is somehow kernel-dependent.