ydwen/opensphere

Why use no_grad for computing d_theta

tangzhongliang opened this issue · 1 comments

hello, i found this code ignore grad for g_cos_theta and angular margin.
https://github.com/ydwen/opensphere/blob/main/model/head/sphereface2.py#L62-L80

Will this not cause network oscillation?

@tangzhongliang I think it is the Characteristic Gradient Detachment to stable training proposed in paper "SphereFace Revived:
Unifying Hyperspherical Face Recognition"