duanyiqun/Auto-ReID-Fast

Triplet loss?

Closed this issue · 4 comments

Hello! Thank you for your contribution.
I would like to know that in distributed computing, does triplet loss communicate with each other?
That is, the hardest positive(negative) sample is selected in their process or in all processes?

Please see the dis_class_sampler.
The triplet sampler is not selecting P/N samples over other process, this sample is limited in their own process.
Any advice?

I have implemented a version with the help of distributed programming that comes with pytorch.
In pytorch's distributed program, the gradient of each process is averaged in each process when it is passed back.So for triplet loss, the feature of each process can be collected using dist.all_gather after forward, and then calculated using triplet loss.
During the network search, the "none" operation of the normal cell quickly converged to 1. In this case, the network structure obtained later is similar to the random search. The reproduced results are currently very poor. Are your search results still okay?

I just use retrieval loss to search the darts search space. The normal cell and reduce cell easily get the following results.
image
Does your search process have this problem?

I think the results is ok previously. Although it may not be that high, but it is no worth than baseline model. Have you ever tried to run DARTS only using normal loss such as cross entropy?
Perhaps you could try manually synchronize grads and batch normalization parameters.