amiralansary/rl-medical

DQN.py training demo code error

Closed this issue · 2 comments

The example code works great for 'eva'l and 'play' tasks, but when I tried running the training example, I'm getting errors such 'TypeError: step() missing 1 required positional argument: 'isOver'.

Here is the command that I used:
python DQN.py --task train --algo DQN --gpu 0 --files './data/filenames/image_files.txt' './data/filenames/landmark_files.txt'

Any help you could provide is greatly appreciated! I'm really excited about your published results.

gml16 commented

Thank you for raising this issue @cedwards77.
It should now be working, please let me know if you encounter further errors.
It was due to me refactoring chunks of the code and integrating multiple agents. I have now moved this code to my forked repository and will push once all testing, validating and training will be working.

Please note that you may receive a memory error if there is not enough RAM available, such as:

I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 1613758464 memory_limit_: 2126008811 available bytes: 512250347 curr_region_allocation_bytes_: 1073741824 2020-02-17 13:37:45.319171: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats: Limit: 2126008811 InUse: 923033856 MaxInUse: 923034624 NumAllocs: 307 MaxAllocSize: 559872000

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[48,32,45,45,45] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node conv0/Conv3D}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Thank you for your quick response @gml16 ! It seems to be working now.