BlinkDL/RWKV-LM

Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1?

Opened this issue · 0 comments

Can you intuitively explain what ratio_0_to_1 is doing in RWKV_Tmix_x060?
https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/src/model.py#L290

I find that ratio_0_to_1 is defined by: ratio_0_to_1 = layer_id / (args.n_layer - 1)
Then it defines multiple things for time_mix and time_decay.

However, my issue is I want to set args.n_layer = 1 , which would lead to the zero-division error.
Does it make sense to hardcode ratio_0_to_1 = 0 when args.n_layer = l?