mit-han-lab/hardware-aware-transformers

Does the generated latency count in the embedding lookup table and the last output layers ?

leo038 opened this issue · 2 comments

According to the code, the generated latency should count in the embedding lookup table and the last output layers. But I find a problem, I train a predictor , and it is very accurate. Then I run the evo search with a hardware latency constraint of 200ms. After the subTransformer is trained, I test the latency, and the latency is 270ms, which is much larger than predicted latency. Why does this happen?

Hi leo038,

Thanks for your question! Yes, it counts in the embedding lookup and last layer. I think there might be several reasons:

  1. The measured latency should be averaged across many times of running to reduce variance
  2. The dataset contains too few samples so does not cover a wide range of subTransformer architecture
  3. Do you separate to train valid and test set for the latency predictor training and observe high accuracy on the test set?

Best,
Hanrui

Hi leo038,

I will close the issue for now. Feel free to reopen if you have any further questions!

Best,
Hanrui