krasserm opened this issue 2 years ago · 0 comments
Mainly required for Perceiver AR training to reduce GPU memory consumption for initial cross-attention