bearpaw/pytorch-pose

Can anyone reproduce the same training accurarcy performance as claimed with pytorch 0.4?

Closed this issue · 10 comments

I trained with the origin code and dataset on 2 different machine, one with a 1060 gpu and another with 2 1080Ti, but never have I got an accurarcy rate over 70% and it was growing pretty slow (some got 20% after 2 epochs, but mine is still way lower than 10%). I noticed someone mentioned in another issue said that he couldn't get good performance on pytorch 0.4.0 either, so I wonder if anyone got good performance. I really don't want to down-grade my pytorch version since I have been modifying the code to implement some points of a paper that couldn't work on lower version pytorch.

@bearpaw why is the performance of the same model and code with different test-batch size varies quite much, smaller the batch size on testing, better the performance. 80% with 2 data per batch and only 50% with 6 data per batch. I think it's the reason why I can't get good training performance.

Alright, it's really problem caused by pytorch version. In pytorch 0.4, for example, torch.tensor(3) / torch.tensor(4) equals tensor(0), even if the source code multiply a 1.0 to make it float, so I only get tensor(0) or tensor(1) out of it. This occurs in the dist_acc() function in evaluation.py. This explains why my testing result gets worse when batch size is large, it has to be all of them lower than threshold will be tensor(1). But is it caused by pytorch version is unknown, I haven't tested it on another version.

Hello, I have also encountered your problem, I want to know specifically in the dist_acc() function in evaluation.py. How to change, thank you for your answer, thank you

You can modify function dist_acc(dists, thr=0.5) of the evaluation.py in pose.utils folder, the original code multiplies 1.0 to convert an integer tensor to float tensor, but it doesn't work in pytorch 0.4, that's why the trainIng accuracy is so low, it has to be all correct to get a 1, otherwise you'll get 0 acc.
To solve this, I just change the *1.0 to .float().

change to this

def dist_acc(dists, thr=0.5):
    ''' Return percentage below threshold while ignoring values with a -1 '''
    if dists.ne(-1).sum() > 0:
        return float(dists.le(thr).eq(dists.ne(-1)).sum()) / float(dists.ne(-1).sum())
    else:
        return -1

Thank you for your answer. It is very helpful to me. Thank you very much for your help.

Hi, @djangogo
I have changed the code according to your advice. However, the test acc still varies in different test batchsize.
For example,
test batchsize acc
1 0.8743
6 0.8660
16 0.8685

@Bob130 try this, I solved my problem by this.
def dist_acc(dists, thr=0.5):
''' Return percentage below threshold while ignoring values with a -1 '''
if dists.ne(-1).sum() > 0:
return dists.le(thr).eq(dists.ne(-1)).sum().float() / dists.ne(-1).sum().float()
else:
return -1

@gdjmck , thank you for your advice.
I use Python 2.7, it doesn't support using int.float() to transform int into float.
I debugged the code, and I found that both of the original and @djangogo 's code could return a float value.
image
But the validation acc still varies in different test batch size.

@Bob130 Did you track the acc values along the functions called in the evaluation.py?
When it comes to problems like this, it seems there is something wrong in the evaluation module.