hubertsiuzdak/snac

Training with attention or not

Naminwang opened this issue · 0 comments

Hi, when i see the config on hugging face for model predict, the attn_window_size is null, so i wonder if the attention is used in training state? And, can you share some training details, some thing like lr, the size of training data...