Training effiency

Question

Training effiency

Closed this issue a year ago · 2 comments

I found that in practical applications, your method did not significantly improve training efficiency. Although the number of parameters to train is much smaller, it still requires calculating gradients for almost all layers due to backpropagation. As a result, there was not a significant reduction in memory usage or training time.

Answer 1 · 2023-07-17T00:38:53.000Z

Hi, we had some discussion about memory usage and training time in Table 6. Yes, we still need the gradients for almost all layers, and the reduction in training time and memory may not be as significant as number of parameters. In fact. this is a problem of existing common PEFT methods such as Adapter, Prompt Tuning, LoRA, etc. There has been a work discussing this problem https://arxiv.org/abs/2206.06522. Further improving the efficiency could be a future direction.

Saving the number of parameters also has other benefits in applications such as communication-efficient distributed learning, privacy preserving federated learning. It also makes it easier to save multiple large models for different tasks.

Answer 2 · 2023-07-17T03:32:01.000Z

thank u for your reply