Space leak in LRU cache
jfennick opened this issue · 1 comments
I am submitting a...
- bug report
- feature request
- support request => you might also like to ask your question on the mailing list or gitter chat.
Description
It appears that there is a space leak in withRemote/updateTask and/or incCount (see attached heap profile). Also, I have enabled ekg and acc.gc.current_bytes_remote increases with time whereas acc.gc.num_lru_evict remains zero. Unfortunately, I do not yet have a minimal example.
Expected behaviour
Current behaviour
Possible solution (optional)
Since updateTask decrements the use count and incCount increments it, I suspect that this is a lazy accumulator issue. It probably just needs a strictness annotation somewhere.
Steps to reproduce (for bugs)
Your environment
- Accelerate version: 1.2.0.1
- Accelerate backend(s) used: accelerate-llvm-ptx
- GHC version: 8.6.5
- Operating system and version: ubuntu 18.04
- Link to your project/example: closed source
- If this is a bug with the GPU backend, include the output of
nvidia-device-query
:
CUDA device query (Driver API, statically linked)
CUDA driver version 10.2
CUDA API version 10.1
Detected 1 CUDA capable device
Device 0: GeForce GTX 1060 6GB
CUDA capability: 6.1
CUDA cores: 1280 cores in 10 multiprocessors (128 cores/MP)
Global memory: 6 GB
Constant memory: 64 kB
Shared memory per block: 48 kB
Registers per block: 65536
Warp size: 32
Maximum threads per multiprocessor: 2048
Maximum threads per block: 1024
Maximum grid dimensions: 2147483647 x 65535 x 65535
Maximum block dimensions: 1024 x 1024 x 64
GPU clock rate: 1.7845 GHz
Memory clock rate: 4.004 GHz
Memory bus width: 192-bit
L2 cache size: 2 MB
Maximum texture dimensions
1D: 131072
2D: 131072 x 65536
3D: 16384 x 16384 x 16384
Texture alignment: 512 B
Maximum memory pitch: 2 GB
Concurrent kernel execution: Yes
Concurrent copy and execution: Yes, with 2 copy engines
Runtime limit on kernel execution: Yes
Integrated GPU sharing host memory: No
Host page-locked memory mapping: Yes
ECC memory support: No
Unified addressing (UVA): Yes
Single to double precision performance: 32 : 1
Supports compute pre-emption: Yes
Supports cooperative launch: Yes
Supports multi-device cooperative launch: Yes
PCI bus/location: 6/0
Compute mode: Default
Multiple contexts are allowed on the device simultaneously
For what it's worth, I've been using near-HEAD of accelerate, accelerate-llvm, accelerate-llvm-ptx on many machines in production, both x86 and aarch64, continuously running many kernels simultaneously for weeks without memory leaks. I think this suggests that either:
- This has been fixed post 1.2.0.1.
- This is somehow kernel-dependent.