Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1?
Opened this issue · 0 comments
zdxdsw commented
Can you intuitively explain what ratio_0_to_1
is doing in RWKV_Tmix_x060
?
https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/src/model.py#L290
I find that ratio_0_to_1
is defined by: ratio_0_to_1 = layer_id / (args.n_layer - 1)
Then it defines multiple things for time_mix
and time_decay
.
However, my issue is I want to set args.n_layer = 1
, which would lead to the zero-division error.
Does it make sense to hardcode ratio_0_to_1 = 0
when args.n_layer = l
?