Where is the implementation of Meta Query in the codes?

I'm confused about the implementation of Language Prompt Module. According to Figure 4 and Sec. 3.2.3, a Meta Query is learned to genearate implicit conditional cue cc via Language Prompt Module. However, according to Fig. 5 and the codes below, it seems that the conditional cue cc is generated based on the global image feature, instead of Meta Query. These two findings appear mutually contradictory to me.

So my question is where can I find the implementation of Meta Query in the codes? By the way, what is the difference between the CoOp-like learnable prompts described in Sec. 3.2.2 and Meta Query?

TCM/ocrclip/ocrclip/ocrclip.py

Lines 262 to 275 in cfa4756

    
           # texts is set of classes name embedding, contexts is learnable prompt embedding 
        
           prompt_gen = None 
        
           # text prompting 
        
           if self.prompt_generator is not None: # text prompt generator 
        
               prompt_gen = self.prompt_generator(global_feat) # (B, C) 
        
           contexts = self.contexts if self.use_learnable_prompt else None # (1, N, C) 
        
           # (B, K, C), last time step t as output, (BKLC->BKC) 
        
           # (1, K, D) -> (B, K, D) 
        
           text_embeddings = self.text_encoder( 
        
                                   self.texts.to(global_feat.device), 
        
                                   contexts, 
        
                                   use_learnable_prompt_only=self.use_learnable_prompt_only, 
        
                                   prompt_gen=prompt_gen).expand(B, -1, -1)

	# texts is set of classes name embedding, contexts is learnable prompt embedding
	prompt_gen = None
	# text prompting
	if self.prompt_generator is not None: # text prompt generator
	prompt_gen = self.prompt_generator(global_feat) # (B, C)

	contexts = self.contexts if self.use_learnable_prompt else None # (1, N, C)
	# (B, K, C), last time step t as output, (BKLC->BKC)
	# (1, K, D) -> (B, K, D)
	text_embeddings = self.text_encoder(
	self.texts.to(global_feat.device),
	contexts,
	use_learnable_prompt_only=self.use_learnable_prompt_only,
	prompt_gen=prompt_gen).expand(B, -1, -1)