HI，run question ....

Question

HI，run question ....

Closed this issue 6 years ago · 9 comments

You set MODEL.DIFFERENT_DILATION.ENABLE = true in smallhardface.toml; but this will return None in manipulate.py.....and the training can not run. When I change it to false, the training runs.....So, why you make it return None in manipulate.py?

Answer 1 · 2018-12-10T10:42:03.000Z

I have runing 30K iterations, the loss keeps around 1.0......

Answer 2 · 2018-12-10T13:15:06.000Z

Hi, we don't take any values manipulate_train return, and it will always return None. Can you please check $ROOT/output/face/wider_train/face_$TIME/stderr.log when the training cannot run?

How many GPUs are you using? I didn't have the issue that the loss doesn't go down. Maybe we fixed the MODEL.DIFFERENT_DILATION.ENABLE = true problem first to see if it helps.

Answer 3 · 2018-12-11T01:51:47.000Z

Yes, I have checked stderr.log , when i set MODEL.DIFFERENT_DILATION.ENABLE = true :

File "train_test.py", line 89 in
manipulate_train(cfg.TRAIN.PROTOTXT, target_train)
File "lib/prototxt/manipulate.py", line 42, in manipulate_train
train_pb = _add_dimension_reduction(train_pb)
File "lib/prototxt/manipulate.py", line 185, in _add_dimension_reduction
] + pb.layer[split:]
File "lib/prototxt/manipulate.py", line 124, in _simple_conv_layer
conv_layer.param.MergeFrom([caffe_pb2.ParamSpec()] * 2)
File "/opt/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/containers.py", line 397, in MergeFrom
self.extend(other._values)
AttributeError: 'list' object has no attribute '_values'

The prototxt path is right,so I have no idea about this error.

Answer 4 · 2018-12-11T02:29:59.000Z

Hi, I am not sure why this happens. I can run my code without problem on my machine. Maybe some packages (e.g. protobuf) are different.

For a quick fix, can you change this line (https://github.com/bairdzhang/smallhardface/blob/master/lib/prototxt/manipulate.py#L121) to conv_layer.param.extend([caffe_pb2.ParamSpec()] * 2)?

Answer 5 · 2018-12-11T02:33:21.000Z

Also, you may want to change this line (https://github.com/bairdzhang/smallhardface/blob/master/lib/prototxt/manipulate.py#L187) to pb.layer.extend(new_layers) if there is also an error happens here. Please let me know if it works, thanks!

Answer 6 · 2018-12-11T02:38:52.000Z

Thank you very much! It works!

Answer 7 · 2018-12-11T02:52:22.000Z

Does the loss go down? In my experiments, the loss goes below 1 quickly (within 100 iters). There is my stderr.log file for your reference (https://livejohnshopkins-my.sharepoint.com/:u:/g/personal/zzhang99_jh_edu/EcxhNa4thKtAq4T0fMAx6VQBsWPLjmKhq0rHWYJpLVcc6Q?e=gBxaQE)

Answer 8 · 2018-12-11T03:38:18.000Z

Yes , the loss goes well as yours. i'm waiting for the eval now.

Answer 9 · 2018-12-12T03:35:49.000Z

How many GPU are you using? What is the GPU and CPU util when it goes slow? I never had this issue before. Did you notice anything suspicious in the stderr.log?