krasserm/perceiver-io

Configurable number of attention heads to be processed in parallel

krasserm opened this issue · 0 comments

Mainly required for Perceiver AR training to reduce GPU memory consumption for initial cross-attention