Question about Efficiency
ToheartZhang opened this issue · 2 comments
Hello, thanks for your insightful work!
I noticed that the steered heads are selected by iteratively steering each head with 1000 samples in each task. Therefore, for each task based on LLaMA-7B, there should be 1000x32x32 times of generation, which seems too time-consuming. Is there anything I missed?
Thanks!
Hello Jason,
Thanks for your great comments. Regarding profiling efficiency, the answer is yes. The total number of generation is 1000x32x32, which is not negligible. However, when compared to fine-tuning, inference-only profiling can still be efficient and acceptable. As an example, one trail of fine-tuning for 32000 steps will require the same number of forward pass (and additional backward pass) when having a regular batch size of 32. Not to mention that we need to run multiple trails to tune the hyperparameters and the fine-tuned models can only be applied to one task. In contrast, the model profiling is conducted only once per model and can be applied to multiple tasks.
In our experiments, we also observe the number of evaluation examples can be further reduced. Like, |D|=100 will also output the close ranking on heads. Nevertheless, it would definitely be an interesting question to further enhance the efficiency of the profiling algorithm :)
Hope this can answer your questions.
Thanks.
Thanks for your detailed explanation and excellent work!