zeakey/skeleton

The acc for testing of hifi v2

Opened this issue · 10 comments

yeLer commented

Dear Senior:
Last time I use you code of hifi v2 to train on SK-Large,and test it on test set(including 750 images).I got the score 0.68477,which is far away from your paper《Hi-Fi Hierarchical Feature Integration for Skeleton Detection》. I think that the reason is may caused by the Hyperparameter setted in h2_solver.prototxt. In fact I use the default parametes in your code in my first testing work.Here is the content:

  • net: "models/h2_train.prototxt"
  • base_lr: 1e-6
  • lr_policy: "step"
  • gamma: 0.5
  • iter_size: 10
  • stepsize: 4000
  • display: 10
  • max_iter: 40000
  • momentum: 0.9
  • weight_decay: 0.0002
  • snapshot: 2000
  • snapshot_prefix: "snapshot/h2feat11-skl"
  • solver_mode: GPU

So I went to read your paper and Checked the Hyperparameters before my second testing work.I changed the parameter like this:

  • net: "models/h2_train.prototxt"
  • base_lr: 1e-6
  • lr_policy: "step"
  • gamma: 0.1
  • iter_size: 10
  • stepsize: 10000
  • display: 10
  • max_iter: 40000
  • momentum: 0.9
  • weight_decay: 0.0002
  • snapshot: 2000
  • snapshot_prefix: "snapshot/h2feat11-skl"
  • solver_mode: GPU

Sadly,I get the score 0.68974.I don't know why there is still such a gap.
Could you give me some suggetions again?If possible,Can you provide a model that has already been trained?

Thank you very much!

Have you ever tried Hi-Hi-v1?
The h1 should be ok because I've double checked.
There maybe problems with h2, I will check and fix in the future.
Currently I'm busy on another project. Sorry for that.

I tried FSDS/Hi-Fi-1/Hi-Fi-2 using the prototxt(s) in the repo:
FSDS (150k iters, single scale): ODS=0.63202
Hi-Fi-1 (40k iters, single scale): ODS=0.66789
Hi-Fi-1 (40k iters, multi scale) ODS=0.66496
Hi-Fi-2 (20k iters, single scale) ODS=0.68659
Hi-Fi-2 (20k iters, multi scale) ODS=0.68304
I can only reproduce the results of FSDS in the paper. Seems Hi-Fi-1 results (ODS=0.703) and Hi-Fi-2 results (ODS = 0.724) cannot be reproduced.

Hi @xwjabc , I've uploaded the results (including pretrained model, detection results and evaluation curve data) of Hi-Fi-1 on SK-Large dataset.

You can download it from http://data.kaizhao.net/projects/skeleton/h1-skl-multiscale-results.zip

We have retrained Hi-Fi-1 for many times and the performance of can be reproduced.

For Hi-Fi-2 the configurations in this repo may be a little different from our implementation and we didn't check it.

And I noticed that for Hi-Fi-1 your single scale performance is nearly the same with that of multiscale.
This is unnormal because in our experiments there is a significant performance gap between single scale
and multiscale inference. Can you use forward_all.py to perform multiscale inference and check again?

@zeakey Thank you for your reply! I will try to evaluate your results and use forward_all.py to perform multi-scale inference again.

I finished 2 experiments:
(1) Evaluate the png files in h1-skl-multiscale-results.zip: ODS=0.70151
(2) Use forward_all.py and evaluate the model in h1-skl-multiscale-results.zip: ODS=0.67672
The results in (2) is still lower than (1) but better than I got before. Such results show that the evaluation script works fine but the forward_all.py may not work properly.
In the forward_all.py, the multi-scale setting is: scales = np.array([0.25, 0.5, 1, 2]). Thus, I wonder if it should be changed to the commented one: #scales = np.array([0.5, 1, 1.5]) which is mentioned in the paper.

@xwjabc Yes we use scales=[1/2, 1, 3/2] as described in our paper, the code is incorrect and has been updated.

If you train h1 by yourself you may get performance about 0.701~0.703.

I changed the scales and re-evaluated the models:
(1) The model in h1-skl-multiscale-results.zip: ODS=0.70153
(2) The model I trained before (Hi-Fi-1): ODS = 0.68294
(2) The model I trained before (Hi-Fi-2): ODS = 0.70078
Since @yeLer mentioned the difference between paper and the repository like below. Should we change the parameters?

net: "models/h1_train.prototxt"
base_lr: 1e-6
lr_policy: "step"
gamma: 0.5 # Paper uses 0.1
iter_size: 10 # Paper only mentioned batch_size = 1, so how about iter_size? FSDS prototxt uses 2 here.
stepsize: 4000 # Paper uses 10000
display: 5
max_iter: 40000
momentum: 0.9
weight_decay: 0.0002
snapshot: 2000
snapshot_prefix: "snapshot/h1-skl"
solver_mode: GPU

By the way, I just saw h1_solver.prototxt is updated. Should we use the prototxt below?

net: "models/h1_train.prototxt"
base_lr: 1e-6
lr_policy: "step"
gamma: 0.5
iter_size: 10
stepsize: 20000
display: 10
max_iter: 40000
momentum: 0.9
weight_decay: 0.0002
snapshot: 5000
snapshot_prefix: "snapshot/h1-skl"
solver_mode: GPU

According to my experiences, the performance is not sensitive to gamma and iter_size.
Actually these hyper-params are copied from HED.

You can decrease lr every 20k iters using gamma=0.1, or decrease lr every 5k iters using gamma=0.5, I think there will be ignorable difference.

If you want to fast examine your training, I suggest you to train for totally 20k iterations and decrease lr by gamma=0.1 after 10k iterations.

If you want to follow the exact hyper-params we use in the paper, they are:

net: "models/h1_train.prototxt"
base_lr: 1e-6
lr_policy: "step"
gamma: 0.1
iter_size: 10
stepsize: 20000
display: 10
max_iter: 40000
momentum: 0.9
weight_decay: 0.0002
snapshot: 5000
snapshot_prefix: "snapshot/h1-skl"
solver_mode: GPU

We iterate 40k to guarantee the convergence, which is unnecessary.

Got it. I saw the repo is updated to the exact hyper-params you mentioned. I will use this setting to train the model. Thank you for your great help!

Finally, I got ODS=0.70064 for Hi-Fi-1 with the updated prototxt (one trial finished yesterday).