Does the generated latency count in the embedding lookup table and the last output layers ?
leo038 opened this issue · 2 comments
leo038 commented
According to the code, the generated latency should count in the embedding lookup table and the last output layers. But I find a problem, I train a predictor , and it is very accurate. Then I run the evo search with a hardware latency constraint of 200ms. After the subTransformer is trained, I test the latency, and the latency is 270ms, which is much larger than predicted latency. Why does this happen?
Hanrui-Wang commented
Hi leo038,
Thanks for your question! Yes, it counts in the embedding lookup and last layer. I think there might be several reasons:
- The measured latency should be averaged across many times of running to reduce variance
- The dataset contains too few samples so does not cover a wide range of subTransformer architecture
- Do you separate to train valid and test set for the latency predictor training and observe high accuracy on the test set?
Best,
Hanrui
Hanrui-Wang commented
Hi leo038,
I will close the issue for now. Feel free to reopen if you have any further questions!
Best,
Hanrui