hkchengrex/XMem

Question about the algorithm and training procedure

Closed this issue · 2 comments

zzzc18 commented

Hi Cheng,

I'm new to the VOS area and after reading the paper I've still got two questions about the algorithm.

  1. Does the readout in XMem (and in Cutie) turn the VOS task from learning an $\text{img}\rightarrow\text{mask}$ map to learning a $\text{similar img}\rightarrow\text{similar mask}\rightarrow\text{mask}$ map through the retrieval process? (local feature level)
  2. Is the long-term memory module not involved in the training process? Does it only occur at test time? As you state in the paper the training sequences are of length eight. Which seems smaller than the $T_{max}=10$.

Thank you for taking the time to read this issue. I greatly appreciate any advice you can provide.

  1. That is more about STCN. I think you can look at it that way on a high level.
  2. It is used in test-time only.
zzzc18 commented

Thank you for your reply!