bairdzhang/smallhardface

HI,run question ....

Closed this issue · 9 comments

You set MODEL.DIFFERENT_DILATION.ENABLE = true in smallhardface.toml; but this will return None in manipulate.py.....and the training can not run. When I change it to false, the training runs.....So, why you make it return None in manipulate.py?

I have runing 30K iterations, the loss keeps around 1.0......

Hi, we don't take any values manipulate_train return, and it will always return None. Can you please check $ROOT/output/face/wider_train/face_$TIME/stderr.log when the training cannot run?

How many GPUs are you using? I didn't have the issue that the loss doesn't go down. Maybe we fixed the MODEL.DIFFERENT_DILATION.ENABLE = true problem first to see if it helps.

Yes, I have checked stderr.log , when i set MODEL.DIFFERENT_DILATION.ENABLE = true :

File "train_test.py", line 89 in
manipulate_train(cfg.TRAIN.PROTOTXT, target_train)
File "lib/prototxt/manipulate.py", line 42, in manipulate_train
train_pb = _add_dimension_reduction(train_pb)
File "lib/prototxt/manipulate.py", line 185, in _add_dimension_reduction
] + pb.layer[split:]
File "lib/prototxt/manipulate.py", line 124, in _simple_conv_layer
conv_layer.param.MergeFrom([caffe_pb2.ParamSpec()] * 2)
File "/opt/anaconda2/lib/python2.7/site-packages/google/protobuf/internal/containers.py", line 397, in MergeFrom
self.extend(other._values)
AttributeError: 'list' object has no attribute '_values'

The prototxt path is right,so I have no idea about this error.

Hi, I am not sure why this happens. I can run my code without problem on my machine. Maybe some packages (e.g. protobuf) are different.

For a quick fix, can you change this line (https://github.com/bairdzhang/smallhardface/blob/master/lib/prototxt/manipulate.py#L121) to conv_layer.param.extend([caffe_pb2.ParamSpec()] * 2)?

Also, you may want to change this line (https://github.com/bairdzhang/smallhardface/blob/master/lib/prototxt/manipulate.py#L187) to pb.layer.extend(new_layers) if there is also an error happens here. Please let me know if it works, thanks!

Thank you very much! It works!

Does the loss go down? In my experiments, the loss goes below 1 quickly (within 100 iters). There is my stderr.log file for your reference (https://livejohnshopkins-my.sharepoint.com/:u:/g/personal/zzhang99_jh_edu/EcxhNa4thKtAq4T0fMAx6VQBsWPLjmKhq0rHWYJpLVcc6Q?e=gBxaQE)

Yes , the loss goes well as yours. i'm waiting for the eval now.

How many GPU are you using? What is the GPU and CPU util when it goes slow? I never had this issue before. Did you notice anything suspicious in the stderr.log?