ilastik/tiktorch

Device is already in use

Opened this issue · 1 comments

No job is running but I got this error

Starting server on 127.0.0.1:5567
01:21:59.654 [MainProcess/ThreadPoolExecutor-0_0] INFO Created session ee970ab64d6d4c2586824229be6e9cc0
01:21:59.654 [MainProcess/ThreadPoolExecutor-0_0] DEBUG Registered close handler <bound method _Lease.terminate of <tiktorch.server.device_pool._Lease object at 0x000001DF73655940>> for session ee970ab64d6d4c2586824229be6e9cc0
01:21:59.654 [MainProcess/ThreadPoolExecutor-0_0] DEBUG Registered close handler <tiktorch.rpc.mp.create_client.<locals>._make_method.<locals>.MethodWrapper object at 0x000001DF73678160> for session ee970ab64d6d4c2586824229be6e9cc0
01:22:01.994 [ModelSessionProcess/ModelThread] INFO Starting session worker
01:22:01.994 [ModelSessionProcess/ModelThread] DEBUG Set new state State.Paused
01:23:42.568 [MainProcess/ThreadPoolExecutor-0_0] ERROR Exception calling application: Device cuda:0 is already in use
Traceback (most recent call last):
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\grpc\_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\tiktorch\server\grpc_svc.py", line 25, in CreateModelSession
    lease = self.__device_pool.lease(request.deviceIds)
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\tiktorch\server\device_pool.py", line 140, in lease
    raise Exception(f"Device {dev_id} is already in use")
Exception: Device cuda:0 is already in use
01:23:42.601 [MainProcess/ThreadPoolExecutor-0_0] ERROR Exception calling application: Device cuda:0 is already in use
Traceback (most recent call last):
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\grpc\_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\tiktorch\server\grpc_svc.py", line 25, in CreateModelSession
    lease = self.__device_pool.lease(request.deviceIds)
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\tiktorch\server\device_pool.py", line 140, in lease
    raise Exception(f"Device {dev_id} is already in use")
Exception: Device cuda:0 is already in use
01:24:16.658 [MainProcess/ThreadPoolExecutor-0_0] INFO Created session 35e3ac3d511344c18dffaf495a234e19
01:24:16.659 [MainProcess/ThreadPoolExecutor-0_0] DEBUG Registered close handler <bound method _Lease.terminate of <tiktorch.server.device_pool._Lease object at 0x000001DF73655910>> for session 35e3ac3d511344c18dffaf495a234e19
01:24:16.659 [MainProcess/ThreadPoolExecutor-0_0] DEBUG Registered close handler <tiktorch.rpc.mp.create_client.<locals>._make_method.<locals>.MethodWrapper object at 0x000001DF73ADD940> for session 35e3ac3d511344c18dffaf495a234e19
01:24:16.745 [ModelSessionProcess/ModelThread] INFO Starting session worker
01:24:16.746 [ModelSessionProcess/ModelThread] DEBUG Set new state State.Paused
01:24:18.181 [MainProcess/ThreadPoolExecutor-0_0] ERROR Exception calling application: Device cpu is already in use
Traceback (most recent call last):
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\grpc\_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\tiktorch\server\grpc_svc.py", line 25, in CreateModelSession
    lease = self.__device_pool.lease(request.deviceIds)
  File "C:\Anaconda3\envs\tiktorch-server-env\lib\site-packages\tiktorch\server\device_pool.py", line 140, in lease
    raise Exception(f"Device {dev_id} is already in use")
Exception: Device cpu is already in use

Hi @nguyen14ck,

pretty late answer, apologies.

Right now the way to go is to restart the server in this instance. This happens when ilastik does an unclean exit (crash) while connected to the tiktorch server.

Cheers
Dominik