[Question] the 'variable-length attention operator in flash attention'
Closed this issue · 2 comments
jungle-gym-ac commented
Hi there! It is mentioned in the paper that "variable-length attention operator provided in flash attention (Dao et al., 2022) to compute the attention for each visual input within the batch independently". However, I read the code in here and did not find code related to this variable-length attention operator, and the high-resolution features are encoded with a for loop. Did I miss something?
Thank you!
liuzuyan commented
jungle-gym-ac commented
Ah thanks! I read the code again and figured it out.