What are the C and P dimensions?
Closed this issue · 2 comments
Rex, in your paper you refer to the C (or C^k) dimension, but I can't find a reference as to what this C is. Is it the embedding dimension?
Also, the code refers to a value P, as in B x CK x [HW/P] - Query keys
. I'm assuming HW is image height and width, but what is P?
I'm working on strategies to reduce Cutie's memory requirements for high resolution images, but the dimensionality of the similarity/affinity matrix is really severe, so I'm looking for any opportunities to reduce this.
Hi.
In code, C
in isolation denotes some channel size -- the exact meaning is context-dependent. In the paper, C
is a shared channel size for most of the operations, except the key tensor (which is C^k
). See
Cutie/cutie/config/model/base.yaml
Lines 4 to 8 in 2ac7ac2
where
C^k
is 64, and all the other 256
jointly refer to C
. We experimented with different values before (and thus allowed the config to set them differently) but just found that it's easier to tie them to a single value.
For P
, it is a value inherited from XMem. It denotes the number of prototypes (Section 3.3 of XMem). Semantically [HW/P] denotes the total number of query elements. During memory reading, it would be the number of pixels HW, and during memory potentiation, it would be the number of prototypes.
Ah [HW/P] is HW or P, not HW divided by P. I see, thank you.