Question about STQI head method and implementation
Closed this issue · 0 comments
fbragman commented
Hi,
Thanks for uploading the code and for a great paper. I have a few questions about the method as I've been reading the paper but found it difficult to understand from the codebase its implementation.
- The STQI decoder has a
DynConv
layer for allN_H
STQI heads. Is thisDynConv
layer within each STQI head the same as in QueryInst? i.e.q_t <-- DynConv_box(p_box, q_t-1)
. wherep_box
are ROI-pooled instance features - In the STQI figure in the paper (Figure 1) - there is just 1
Dynamic Conv
per head. InQueryInst
there are both dynamic mask and dynamic box layers for each stage. Can you confirm there is onlyDynConv_box
in STQI? - The features from either
MsgShiftT
orSwin
are multi-scale. How are the multi-resolution features dealt with inDynConv_box
orDynConv_mask
. I can't find this information in the manuscript. Do you make predictions for every scale like in an FPN network? - Do the
N_H
STQI-heads replace the 6 stages you might have inQueryInst
?
Many thanks!