How to run BPA for VITs?
Closed this issue · 2 comments
Hello, authors! Thank you for your excellent work. Notice that you report the performances of BPA on ViTs. However, only the open-source code of BPA for CNNs is provided. Would you mind releasing the relevant code for ViTs?
Thank you very much!
Look forward to your reply.
BPA identifies that non-linear layers (e.g., ReLU, max-pooling) truncate the gradient during the backward propagation. Based on this finding, it adopts a non-monotonic function as the derivative of ReLU and incorporates softmax to smooth the derivative of max-pooling. However, ReLU and max-pooling are not widely adopted in ViTs. Hence, we do not provide the code for ViTs.
Thank you for your kind response. My issue has been well addressed. Have a wonderful day!