Training with attention or not
Naminwang opened this issue · 0 comments
Naminwang commented
Hi, when i see the config on hugging face for model predict, the attn_window_size is null, so i wonder if the attention is used in training state? And, can you share some training details, some thing like lr, the size of training data...