The acc for testing of hifi v2
Opened this issue · 10 comments
Dear Senior:
Last time I use you code of hifi v2 to train on SK-Large,and test it on test set(including 750 images).I got the score 0.68477,which is far away from your paper《Hi-Fi Hierarchical Feature Integration for Skeleton Detection》. I think that the reason is may caused by the Hyperparameter setted in h2_solver.prototxt. In fact I use the default parametes in your code in my first testing work.Here is the content:
- net: "models/h2_train.prototxt"
- base_lr: 1e-6
- lr_policy: "step"
- gamma: 0.5
- iter_size: 10
- stepsize: 4000
- display: 10
- max_iter: 40000
- momentum: 0.9
- weight_decay: 0.0002
- snapshot: 2000
- snapshot_prefix: "snapshot/h2feat11-skl"
- solver_mode: GPU
So I went to read your paper and Checked the Hyperparameters before my second testing work.I changed the parameter like this:
- net: "models/h2_train.prototxt"
- base_lr: 1e-6
- lr_policy: "step"
- gamma: 0.1
- iter_size: 10
- stepsize: 10000
- display: 10
- max_iter: 40000
- momentum: 0.9
- weight_decay: 0.0002
- snapshot: 2000
- snapshot_prefix: "snapshot/h2feat11-skl"
- solver_mode: GPU
Sadly,I get the score 0.68974.I don't know why there is still such a gap.
Could you give me some suggetions again?If possible,Can you provide a model that has already been trained?
Thank you very much!
Have you ever tried Hi-Hi-v1?
The h1 should be ok because I've double checked.
There maybe problems with h2, I will check and fix in the future.
Currently I'm busy on another project. Sorry for that.
I tried FSDS/Hi-Fi-1/Hi-Fi-2 using the prototxt(s) in the repo:
FSDS (150k iters, single scale): ODS=0.63202
Hi-Fi-1 (40k iters, single scale): ODS=0.66789
Hi-Fi-1 (40k iters, multi scale) ODS=0.66496
Hi-Fi-2 (20k iters, single scale) ODS=0.68659
Hi-Fi-2 (20k iters, multi scale) ODS=0.68304
I can only reproduce the results of FSDS in the paper. Seems Hi-Fi-1 results (ODS=0.703) and Hi-Fi-2 results (ODS = 0.724) cannot be reproduced.
Hi @xwjabc , I've uploaded the results (including pretrained model, detection results and evaluation curve data) of Hi-Fi-1 on SK-Large dataset.
You can download it from http://data.kaizhao.net/projects/skeleton/h1-skl-multiscale-results.zip
We have retrained Hi-Fi-1 for many times and the performance of can be reproduced.
For Hi-Fi-2 the configurations in this repo may be a little different from our implementation and we didn't check it.
And I noticed that for Hi-Fi-1 your single scale performance is nearly the same with that of multiscale.
This is unnormal because in our experiments there is a significant performance gap between single scale
and multiscale inference. Can you use forward_all.py to perform multiscale inference and check again?
@zeakey Thank you for your reply! I will try to evaluate your results and use forward_all.py to perform multi-scale inference again.
I finished 2 experiments:
(1) Evaluate the png files in h1-skl-multiscale-results.zip: ODS=0.70151
(2) Use forward_all.py and evaluate the model in h1-skl-multiscale-results.zip: ODS=0.67672
The results in (2) is still lower than (1) but better than I got before. Such results show that the evaluation script works fine but the forward_all.py may not work properly.
In the forward_all.py, the multi-scale setting is: scales = np.array([0.25, 0.5, 1, 2])
. Thus, I wonder if it should be changed to the commented one: #scales = np.array([0.5, 1, 1.5])
which is mentioned in the paper.
@xwjabc Yes we use scales=[1/2, 1, 3/2]
as described in our paper, the code is incorrect and has been updated.
If you train h1
by yourself you may get performance about 0.701~0.703.
I changed the scales and re-evaluated the models:
(1) The model in h1-skl-multiscale-results.zip: ODS=0.70153
(2) The model I trained before (Hi-Fi-1): ODS = 0.68294
(2) The model I trained before (Hi-Fi-2): ODS = 0.70078
Since @yeLer mentioned the difference between paper and the repository like below. Should we change the parameters?
net: "models/h1_train.prototxt"
base_lr: 1e-6
lr_policy: "step"
gamma: 0.5 # Paper uses 0.1
iter_size: 10 # Paper only mentioned batch_size = 1, so how about iter_size? FSDS prototxt uses 2 here.
stepsize: 4000 # Paper uses 10000
display: 5
max_iter: 40000
momentum: 0.9
weight_decay: 0.0002
snapshot: 2000
snapshot_prefix: "snapshot/h1-skl"
solver_mode: GPU
By the way, I just saw h1_solver.prototxt is updated. Should we use the prototxt below?
net: "models/h1_train.prototxt"
base_lr: 1e-6
lr_policy: "step"
gamma: 0.5
iter_size: 10
stepsize: 20000
display: 10
max_iter: 40000
momentum: 0.9
weight_decay: 0.0002
snapshot: 5000
snapshot_prefix: "snapshot/h1-skl"
solver_mode: GPU
According to my experiences, the performance is not sensitive to gamma
and iter_size
.
Actually these hyper-params are copied from HED.
You can decrease lr every 20k iters using gamma=0.1, or decrease lr every 5k iters using gamma=0.5, I think there will be ignorable difference.
If you want to fast examine your training, I suggest you to train for totally 20k iterations and decrease lr by gamma=0.1 after 10k iterations.
If you want to follow the exact hyper-params we use in the paper, they are:
net: "models/h1_train.prototxt"
base_lr: 1e-6
lr_policy: "step"
gamma: 0.1
iter_size: 10
stepsize: 20000
display: 10
max_iter: 40000
momentum: 0.9
weight_decay: 0.0002
snapshot: 5000
snapshot_prefix: "snapshot/h1-skl"
solver_mode: GPU
We iterate 40k to guarantee the convergence, which is unnecessary.
Got it. I saw the repo is updated to the exact hyper-params you mentioned. I will use this setting to train the model. Thank you for your great help!
Finally, I got ODS=0.70064 for Hi-Fi-1 with the updated prototxt (one trial finished yesterday).