hr_res101('train') error : "Error using gpuDevice (line 26) Invalid CUDA device id"
niamul070 opened this issue · 2 comments
When I run hr_res101('train"), I am getting the error mentioned above. Can you tell how to fix it. Below is the detailed output and error message:
hr_res101('train');
ans =
models/widerface-resnet-101-simple-sample256-posfrac0.5-N25-bboxreg-cluster-scaled
Trying to initialize the structure of resnet-101-simple
Unknown model: cannot initialize.
Loading pretrained weights from ./trained_models/imagenet-resnet-101-dag.mat
Loaded imdb from data/widerface/imdb.mat
cluster path: data/widerface/RefBox_N25_scaled.mat
opts =
struct with fields:
keepDilatedZeros: 0
inputSize: [500 500]
learningRate: [1×30 double]
trainFn: '@cnn_train_dag_hardmine'
batchGetterFn: '@cnn_get_batch_hardmine'
freezeResNet: 0
tag: ''
clusterNum: 25
clusterName: 'scaled'
bboxReg: 1
skipLRMult: [0 1 0.1000]
sampleSize: 256
posFraction: 0.5000
posThresh: 0.7000
negThresh: 0.3000
border: [0 0]
pretrainModelPath: './trained_models/imagenet-resnet-101-dag.mat'
dataDir: 'data/widerface'
modelType: 'resnet-101-simple'
networkType: 'dagnn'
batchNormalization: 1
weightInitMethod: 'gaussian'
minClusterSize: [10 10]
maxClusterSize: [Inf Inf]
expDir: 'models/widerface-resnet-101-simple-sample256-posf...'
batchSize: 48
numSubBatches: 1
numEpochs: 50
gpus: [1 2 3 4]
numFetchThreads: 8
lite: 0
imdbPath: 'data/widerface/imdb.mat'
train: [1×1 struct]
ans =
struct with fields:
gpus: [1 2 3 4]
batchSize: 48
numSubBatches: 1
numEpochs: 50
learningRate: [1×30 double]
keepDilatedZeros: 0
Start using dagnn.DetLoss for loss
Starting parallel pool (parpool) using the 'local' profile ... Warning: The system time zone setting, 'US/Eastern', does not specify a single
time zone unambiguously. It will be treated as 'America/New_York'. See the datetime.TimeZone property for
details about specifying time zones.
In verifyTimeZone (line 23)
In datetime (line 503)
In parallel.internal.cluster.FileSerializer>iLoadDate (line 345)
In parallel.internal.cluster.FileSerializer/getFields (line 100)
In parallel.internal.cluster.CJSSupport/getProperties (line 252)
In parallel.internal.cluster.CJSSupport/getJobProperties (line 463)
In parallel.internal.cluster.CJSJobMixin/hGetProperty (line 70)
In parallel.internal.cluster.CJSJobMixin/hSetTerminalStateFromCluster (line 98)
In parallel.cluster.CJSCluster/hGetJobState (line 361)
In parallel.internal.cluster.CJSJobMixin/getStateEnum (line 136)
In parallel.Job/get.StateEnum (line 214)
In parallel.Job/get.State (line 206)
In parallel.internal.customattr.CustomGetSet>iVectorisedGetHelper (line 107)
In parallel.internal.customattr.CustomGetSet>@(a,b,c)iVectorisedGetHelper(obj,a,b,c) (line 89)
In parallel.internal.customattr.CustomGetSet/doVectorisedGet (line 90)
In parallel.internal.customattr.CustomGetSet/hVectorisedGet (line 64)
In parallel.internal.customattr.GetSetImpl>iAccessProperties (line 289)
In parallel.internal.customattr.GetSetImpl>iGetAllProperties (line 250)
In parallel.internal.customattr.GetSetImpl.getImpl (line 124)
In parallel.internal.customattr.CustomGetSet/get (line 30)
In parallel.internal.pool.InteractiveClient/pRemoveOldJobs (line 464)
In parallel.internal.pool.InteractiveClient/start (line 311)
In parallel.Pool>iStartClient (line 567)
In parallel.Pool.hBuildPool (line 446)
In parallel.internal.pool.doParpool (line 15)
In parpool (line 89)
In cnn_train_dag_hardmine>prepareGPUs (line 604)
In cnn_train_dag_hardmine (line 132)
In cnn_widerface (line 212)
In hr_res101 (line 41)
connected to 4 workers.
cnn_train_dag_hardmine: resetting GPU
Error using cnn_train_dag_hardmine>prepareGPUs (line 616)
Error detected on worker 3.
Error in cnn_train_dag_hardmine (line 132)
prepareGPUs(opts, epoch == start+1) ;
Error in cnn_widerface (line 212)
[net, info] = trainFn(net, imdb, getBatchFn(batchGetter, opts, net.meta), ...
Error in hr_res101 (line 41)
cnn_widerface('inputSize', inputSize, ...
Caused by:
Error using gpuDevice (line 26)
Invalid CUDA device id: 3. Select a device id from the range 1:1.
When I run gpuDevice from matlab prompt this is what I get:
gpuDevice
ans =
CUDADevice with properties:
Name: 'Quadro M4000'
Index: 1
ComputeCapability: '5.2'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 7.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.4922e+09
AvailableMemory: 7.5519e+09
MultiprocessorCount: 13
ClockRateKHz: 772500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Never mind I solved it. Thanks
I'm facing the same problem. Can you please tell me how you resolved this issue?