Oryx-mllm/Oryx

[Question] the 'variable-length attention operator in flash attention'

Closed this issue · 2 comments

Hi there! It is mentioned in the paper that "variable-length attention operator provided in flash attention (Dao et al., 2022) to compute the attention for each visual input within the batch independently". However, I read the code in here and did not find code related to this variable-length attention operator, and the high-resolution features are encoded with a for loop. Did I miss something?
Thank you!

Hi, thanks for your interest in our work! We pre-process the input images into a list in the code you implemented and then forward the whole list to OryxViT for batch computation here. The variable-length attention is operated here. Feel free to ask should you have further questions!

Ah thanks! I read the code again and figured it out.